This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUCallLowering.cpp
-
AMDGPUISelDAGToDAG.cpp
-
AMDGPUInstructionSelector.cpp
-
AMDGPUTargetMachine.cpp
-
MCTargetDesc/
-
AMDGPUInstPrinter.cpp
1/1
SIFoldOperands.cpp
-
SIFrameLowering.h
9/18
SIFrameLowering.cpp
-
SIISelLowering.cpp
-
SIInstrInfo.cpp
-
SIInstructions.td
1/2
SIMachineFunctionInfo.h
-
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.h
1/1
SIRegisterInfo.cpp
-
SIRegisterInfo.td
-
test/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
divergent-control-flow.ll
-
inst-select-load-local.mir
-
inst-select-load-private.mir
-
inst-select-store-local.mir
-
inst-select-store-private.mir
-
addrspacecast.ll
-
amdgpu.private-memory.ll
-
array-ptr-calc-i32.ll
1/1
attr-amdgpu-num-sgpr.ll
-
byval-frame-setup.ll
-
call-argument-types.ll
-
call-constant.ll
-
call-preserved-registers.ll
-
call-waitcnt.ll
-
callee-special-input-sgprs.ll
-
callee-special-input-vgprs.ll
-
captured-frame-index.ll
-
cc-update.ll
-
cgp-addressing-modes.ll
-
chain-hi-to-lo.ll
-
collapse-endcf.ll
-
control-flow-fastregalloc.ll
-
cross-block-use-is-not-abi-copy.ll
-
extload-private.ll
-
fast-unaligned-load-store.private.ll
-
fold-fi-mubuf.mir
-
frame-index-elimination.ll
-
frame-lowering-entry-all-sgpr-used.mir
-
frame-lowering-fp-adjusted.mir
-
function-returns.ll
-
hsa-metadata-kernel-code-props-v3.ll
-
hsa-metadata-kernel-code-props.ll
-
idot8s.ll
-
idot8u.ll
-
indirect-addressing-term.ll
-
insert_vector_elt.ll
-
ipra.ll
-
large-alloca-compute.ll
-
large-alloca-graphics.ll
-
llvm.amdgcn.implicit.buffer.ptr.ll
-
load-hi16.ll
-
load-lo16.ll
-
memory-legalizer-load.ll
-
memory-legalizer-store.ll
-
memory_clause.ll
-
mesa3d.ll
-
mir-print-dead-csr-fi.mir
-
misched-killflags.mir
-
mubuf-offset-private.ll
-
optimize-exec-masking-pre-ra.mir
-
partial-sgpr-to-vgpr-spills.ll
-
pei-reg-scavenger-position.mir
-
pei-scavenge-sgpr-carry-out.mir
-
pei-scavenge-sgpr-gfx9.mir
-
pei-scavenge-sgpr.mir
-
private-access-no-objects.ll
-
private-element-size.ll
-
rename-independent-subregs-mac-operands.mir
-
sched-assert-dead-def-subreg-use-other-subreg.mir
-
sched-handleMoveUp-subreg-def-across-subreg-def.mir
-
scratch-buffer.ll
2/3
scratch-simple.ll
-
sgpr-spill-wrong-stack-id.mir
-
shl_add_ptr.ll
-
si-spill-sgpr-stack.ll
-
sibling-call.ll
-
sp-too-many-input-sgprs.ll
-
spill-agpr.ll
-
spill-before-exec.mir
-
spill-empty-live-interval.mir
-
spill-m0.ll
1/1
spill-offset-calculation.ll
-
stack-pointer-offset-relative-frameindex.ll
-
stack-realign-kernel.ll
-
stack-realign.ll
-
stack-slot-color-sgpr-vgpr-spills.mir
-
store-hi16.ll
-
subreg-split-live-in-error.mir
-
subvector-test.mir
-
vgpr-spill-emergency-stack-slot.ll
-
virtregrewrite-undef-identity-copy.mir
-
wqm.ll
-
wwm-reserved.ll
-
MIR/AMDGPU/
-
AMDGPU/
-
machine-function-info-no-ir.mir
-
machine-function-info.ll
-
mfi-parse-error-scratch-wave-offset-reg.mir
-
mfi-scratch-wave-offset-reg-class.mir
-
parse-order-reserved-regs.mir
-
DebugInfo/AMDGPU/
-
AMDGPU/
-
variable-locations.ll

Differential D75138

[WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
ClosedPublic

Authored by scott.linder on Feb 25 2020, 1:34 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
cdevadas
kzhuravl
b-sumner
RamNalamothu
mareko

Commits

rG60b1967c3933: [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Summary

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This frees up the preloaded scratch wave
offset register after the entry function prologue and removes the
scratch wave offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	40 ms	LLVM.CodeGen/AMDGPU::Unknown Unit Message ("")

Event Timeline

scott.linder created this revision.Feb 25 2020, 1:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2020, 1:34 PM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 9 others. · View Herald Transcript

Harbormaster completed remote builds in B47237: Diff 246557.Feb 25 2020, 1:34 PM

scott.linder added reviewers: arsenm, rampitec, cdevadas, kzhuravl, b-sumner.Feb 25 2020, 1:35 PM

I'm having trouble working out the best way forward on this patch, with the core issue relating to the fact that we no longer need anything equivalent to a frame pointer in the entry function when there is no stack usage. This is complicated by the fact that hasFP is broken in some of the places it is called, including reservePrivateMemoryRegs. I'm not sure I completely understand where the best place to handle this is, but without addressing it I can't avoid gratuitously initializing the SP and/or FP in many cases, including a trivial kernel with no body.

I'm also not sure if my ISA for initializing the SRSRC is optimal and wanted to get feedback. I do think that in at least some cases we will need to save a DWORD out of the SRSRC while updating it, and in those cases I'm not certain scavenging one is infallible (see the cc-update-scavenge-fail.ll test case). Is there a better approach here?

I assume this is missing a lot of test updates?

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
554–555	Do we actually need these bits? I'm fairly confident these are always 0 in the HSA resource descriptor (or at least are a known constant we can just reproduce later)
558	I think just 0xffff0000 would be clearer here
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
340	These should be switched to Register at some point
llvm/test/CodeGen/AMDGPU/cc-update-scavenge-fail.ll
5 ↗	(On Diff #246557)	I would move this to the first line, and check the error message to make sure it fails for the right reason

In D75138#1892192, @scott.linder wrote:

I'm having trouble working out the best way forward on this patch, with the core issue relating to the fact that we no longer need anything equivalent to a frame pointer in the entry function when there is no stack usage. This is complicated by the fact that hasFP is broken in some of the places it is called, including reservePrivateMemoryRegs. I'm not sure I completely understand where the best place to handle this is, but without addressing it I can't avoid gratuitously initializing the SP and/or FP in many cases, including a trivial kernel with no body.

Why is this a problem exactly? I only vaguely remember what kind of problems this would cause. hasFP has always been broken depending on what time it's called, so in some places we do have to guess if it's needed

arsenm added inline comments.Feb 25 2020, 2:13 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
554–555	According to this it's hardcoded: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/core/runtime/amd_aql_queue.cpp#L1015 We just need to worry about SWIZZLE_ENABLE being set to 1. This is the high bit, so all it can do is trigger a carry on the second add. So I think that means you can get away with just doing the add, and then using s_bitset1_b32 to ensure it wasn't carried away

arsenm added inline comments.Feb 25 2020, 2:17 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
554–555	Actually, I don't think any add that fits in the 48-bit address space should ever touch the high bits (although I usually manage to be wrong about known bits optimizations with adds)

arsenm added inline comments.Feb 25 2020, 3:12 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
554–555	I think this means it's OK to just not worry about the high bits: https://rise4fun.com/Alive/i24

arsenm added inline comments.Feb 25 2020, 3:19 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
554–555	As long as we know bit 48 is 0, this seems fine. As this is hardcoded in the driver, this is probably OK https://rise4fun.com/Alive/KmH

RamNalamothu added a subscriber: RamNalamothu.Feb 25 2020, 7:02 PM

Yes, there are a lot of test updates and likely more new tests needed, but I just posted some tests that exercise the bits I'm currently stuck on for now.

I will try to articulate the issue with hasFP better tomorrow morning, but currently we are making the decision about whether to have a distinct FP (i.e. S34) before we actually know if we use the stack. If we have a call, but no stack use early, and then later we need to reference the stack we end up in a situation where at PEI time we are updating the same register both for the ABI SP and for the entry function FP, which obviously isn't right.

The right thing seems to be to not have any stack or frame pointer at all, but I am not sure how to implement that and wanted to ask for some help estimating how reasonable that would be.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
554–555	That make sense to me, and this would simplify things a lot. I don't quite understand if we need to ensure [48:62] are 0, though? If the addc carries into bit 48 is that an issue? I.e. https://rise4fun.com/Alive/qsv At the very least, it seems like we can avoid the need to save anything and just mask in a constant, but if it is possible to avoid that too it removes a couple additional instructions from nearly every kernel prologue.

scott.linder edited reviewers, added: RamNalamothu; removed: ramana-nvr.Feb 25 2020, 7:25 PM

I'm going to go ahead with trying to eliminate the need for an FP completely in entry functions and then update the review with a more complete set of test updates. I'm sure the issue I was having with defining and using hasFP consistently between ISel and PEI could be worked around, but putting that effort into eliminating the FP entirely in entry functions seems more productive.

Update/add tests and eliminate use of FP in entry functions

Herald added subscribers: arphaman, qcolombet, MatzeB. · View Herald TranscriptMar 4 2020, 4:03 PM

scott.linder added a parent revision: D75092: [AMDGPU][NFC] Refactor emitEntryFunctionPrologue.Mar 4 2020, 4:05 PM

scott.linder added a child revision: D75657: [WIP][AMDGPU] Move frame pointer from s34 to s33.

scott.linder retitled this revision from [WIP][AMDGPU] Eliminate the ScratchWaveOffset register from the calling convention to [WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions.Mar 4 2020, 4:23 PM

scott.linder edited the summary of this revision. (Show Details)

arsenm added inline comments.Mar 4 2020, 4:30 PM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
631–632	Should demorgan this
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
290–292	This should not need to inspect the original IR. Why can't this just read it directly from MFI? They should be accounted there already?
293	This will be inaccurate for any struct type, this should have been computed during lowering that knows the type split

There are still a reasonable amount of FIXME/TODO in this patch, and I left some additional comments on each to highlight them and ask for feedback on them. I am not entirely comfortable with the way I went about implementing the special-casing for having no FP in the entry function. I would prefer not having all of the isEntryFunction checks everywhere, but I'm not sure how else to represent it?

I also would rather break this patch up more, but I don't think doing so will make it easier to understand or reduce the size of the test diffs. The only pieces I could break off naturally were some NFC changes in https://reviews.llvm.org/D75092 and switching to s33 for FP in https://reviews.llvm.org/D75657

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
290	@arsenm @nhaehnle I don't think I understand how `inreg` currently works relative to "preloaded" SGPRs; is/should `inreg` be recorded somewhere in the machine function info so this isn't necessary?
309	Similar question here, should there be a change in `SITargetLowering` so the preloaded count is correct?
554–555	I went the route of just always doing the 64-bit add of the scratch wave offset into the SRsrc rather than saving anything or using known constants for some of the bits. From some other discussion this should always be correct.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
340	I haven't gotten around to this yet, but I'll do this in another NFC patch.
llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll
8	Can anyone help me understand what we are trying to test here? It seems likely the amount of live SGPRs and the amount of available SGPRs needs to be adjusted to have this test continue to be meaningful, but in trying to correct it I realized I wasn't sure what it was testing in the first place.
llvm/test/CodeGen/AMDGPU/scratch-simple.ll
100	@arsenm @nhaehnle Similar question as above wrt. how `inreg` should work. Is the `%swo` argument in these expected to actually be allowed to coincide with the scratch wave offset?
llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll
52	Is it OK for us to fail here? This is a consequence of not having a frame pointer in entry functions and not being able to e.g. restart RA after we realize we really need it in this case.

Harbormaster failed remote builds in B48132: Diff 248352!Mar 4 2020, 4:54 PM

arsenm added inline comments.Mar 9 2020, 1:09 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
290	Not directly. There shouldn't be any repeating of the calling convention logic here. Either the number of SGPR arguments should be recorded, or it should be inferred from the machine code. It might be correct to just count the number of SGPR in the function live-in list. I think live in registers can be deleted from the list if they are proven to be unused, so this might be fragile. Finding the highest live in SGPR number may also work.

Support FP in entry functions by reverting most of the changes needed
before PEI in the previous patches. Now the entry function always
allocates S32 for the SP, and optionally allocates S34 as the FP.

There are still a couple tests to be updated, but they are just due
to RA noise.

scott.linder edited the summary of this revision. (Show Details)Mar 10 2020, 4:53 PM

Harbormaster completed remote builds in B48759: Diff 249521.Mar 10 2020, 5:33 PM

LGTM with nits

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
362–365	Braces
388	s/unsigned/Register
395	Ditto
541	Braces
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
99	s/NoRegister/Register()

This revision is now accepted and ready to land.Mar 10 2020, 6:46 PM

I think commit comment "The ABI stack pointer register remains unswizzled, but is now wave-relative instead of dispatch-relative." shuld chage to "The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative." since for the HSAABI the scratch base is the queue base and not per dispatch. The PALABI may use per dispatch scratch allocation.

t-tye added inline comments.Mar 10 2020, 7:59 PM

llvm/docs/AMDGPUUsage.rst
8557–8559 ↗	(On Diff #249521)	"This can be done without having to perform register allocation again, which is necessary as register allocation may introduce spills." Suggest moving this to a separate bullet and reword to make clear why this approach is done: "- Note: this approach of using a tentative scratch SRD and shifting the register numbers if used, avoids having to perform register allocation a second time if the tentative SRD is eliminated. This is more efficient and avoids the problem that the second register allocation may perform spilling which will fail as here is no longer a scratch SRD." For consistency, should SRD be changed to V# to match the usage in the next section?
8572 ↗	(On Diff #249521)	Should the manner that the kernel prolog sets the scratch V# be specified? The compiler requests that the scratch V# and wave scratch offset be passed in using the kernel descriptor (reference the section), The wave scratch offset is added to the queue base address in the scratch V#and moved to SGPR0-3. Also specify how the kernel must set the FLAT_SCRATCH. The compiler requests that the flat scratch and wave scratch offset be passed in using the kernel descriptor (reference the section), The wave scratch offset is added to the flat scratch base and moved to FLAT_SCRATCH. Should setup up of M0 also be defined here. For GFX6-??? it is set to the LDS size, otherwise it is set to ???. Any other setup that hs to be done in the kernel prolog?
8614 ↗	(On Diff #249521)	"private address" -> "private address space address"
8653 ↗	(On Diff #249521)	Is this necessary to say since the following bullet states all SGPS except 4-31 which means SGPR0-3 aare preserved?

scott.linder edited the summary of this revision. (Show Details)Mar 11 2020, 11:04 AM

In D75138#1916158, @t-tye wrote:

I think commit comment "The ABI stack pointer register remains unswizzled, but is now wave-relative instead of dispatch-relative." shuld chage to "The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative." since for the HSAABI the scratch base is the queue base and not per dispatch. The PALABI may use per dispatch scratch allocation.

I updated the commit message, but I didn't include mention of the possibility of the PALABI differing here. Is there a more generic way to describe the old behavior for every ABI the compiler supports? As far as the compiler is concerned it is only important that the SRSRC base + the scratch wave offset gets it to the base for the scratch allocation for the wave.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
290	In switching back to supporting an FP I no longer see the need for this manifest, but there may still be a need to update this in the future. I don't think my change is making this any more fragile so I'm leaving it as it was.

scott.linder added a parent revision: D76035: [AMDGPU][NFC] Refactor some uses of unsigned to Register.Mar 11 2020, 4:40 PM

Address feedback

scott.linder added inline comments.Mar 11 2020, 4:45 PM

llvm/docs/AMDGPUUsage.rst
8572 ↗	(On Diff #249521)	I didn't notice originally that we have a section "Code Conventions > AMDHSA > Kernel Prolog" which already describes some of this. It seemed odd to put some of that here and some of that there, so I ended up trying to just move all the relevant bits to the Kernel Prolog section and reference it here. It ends up being a bit circular in that the Kernel Prolog section defers to the Calling Convention section for the definition of the ABI stack pointer, and the Calling Convention section defers to the Kernel Prolog section for the description of the properties of M0/FlatScratch/V# and how they are initialized. I think it is OK, but maybe you have some suggestions?

Harbormaster failed remote builds in B48909: Diff 249800!Mar 11 2020, 5:35 PM

scott.linder added a reviewer: mareko.Mar 16 2020, 9:28 AM

Finish updating remaining tests. Remove Kill from last use of scratch wave
offset in prologue, as it is used in at least some Mesa shaders.

arsenm accepted this revision.Mar 17 2020, 3:27 PM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/scratch-simple.ll
95–96	Can you add a comment elaborating on what this tests

scott.linder marked an inline comment as done.Mar 17 2020, 3:58 PM

scott.linder added inline comments.

llvm/test/CodeGen/AMDGPU/scratch-simple.ll
95–96	From discussion with @mareko my understanding is that Mesa GS and HS shaders have the preloaded scratch wave offset SGPR fixed at SGPR5, and the inreg implementation is used to reference it in the IR. So here, the shader snippet inserted after the SI_RETURN_TO_EPILOG wants to use the scratch wave offset, and the IR passes it along by padding out the inreg arguments until it gets to where the scratch wave offset is, and then using it in the return value. I'll add something to that effect in the test.

Harbormaster failed remote builds in B49513: Diff 250925!Mar 17 2020, 4:14 PM

Closed by commit rG60b1967c3933: [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions (authored by scott.linder). · Explain WhyMar 19 2020, 1:10 PM

This revision was automatically updated to reflect the committed changes.

foad mentioned this in D79073: [AMDGPU] For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer.Apr 29 2020, 3:57 AM

critson mentioned this in D79776: [AMDGPU] Allow use of StackPtrOffsetReg when building spills.May 12 2020, 4:49 AM

critson mentioned this in rGa065a01bf715: [AMDGPU] Allow use of StackPtrOffsetReg when building spills.May 15 2020, 8:05 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUCallLowering.cpp

2 lines

AMDGPUISelDAGToDAG.cpp

24 lines

AMDGPUInstructionSelector.cpp

41 lines

AMDGPUTargetMachine.cpp

6 lines

MCTargetDesc/

AMDGPUInstPrinter.cpp

1 line

26 lines

18 lines

269 lines

46 lines

45 lines

8 lines

SIMachineFunctionInfo.h

23 lines

SIMachineFunctionInfo.cpp

4 lines

SIRegisterInfo.h

5 lines

SIRegisterInfo.cpp

186 lines

SIRegisterInfo.td

3 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

divergent-control-flow.ll

2 lines

inst-select-load-local.mir

278 lines

inst-select-load-private.mir

111 lines

inst-select-store-local.mir

224 lines

inst-select-store-private.mir

383 lines

addrspacecast.ll

4 lines

amdgpu.private-memory.ll

20 lines

array-ptr-calc-i32.ll

4 lines

attr-amdgpu-num-sgpr.ll

5 lines

byval-frame-setup.ll

350 lines

call-argument-types.ll

74 lines

call-constant.ll

6 lines

call-preserved-registers.ll

31 lines

call-waitcnt.ll

37 lines

callee-special-input-sgprs.ll

42 lines

callee-special-input-vgprs.ll

21 lines

captured-frame-index.ll

36 lines

cc-update.ll

249 lines

cgp-addressing-modes.ll

12 lines

chain-hi-to-lo.ll

28 lines

collapse-endcf.ll

2 lines

control-flow-fastregalloc.ll

64 lines

cross-block-use-is-not-abi-copy.ll

14 lines

extload-private.ll

8 lines

fast-unaligned-load-store.private.ll

77 lines

fold-fi-mubuf.mir

195 lines

frame-index-elimination.ll

63 lines

frame-lowering-entry-all-sgpr-used.mir

1 line

frame-lowering-fp-adjusted.mir

3 lines

function-returns.ll

202 lines

hsa-metadata-kernel-code-props-v3.ll

8 lines

hsa-metadata-kernel-code-props.ll

6 lines

idot8s.ll

2275 lines

idot8u.ll

2572 lines

indirect-addressing-term.ll

102 lines

insert_vector_elt.ll

48 lines

ipra.ll

2 lines

large-alloca-compute.ll

4 lines

large-alloca-graphics.ll

42 lines

llvm.amdgcn.implicit.buffer.ptr.ll

4 lines

load-hi16.ll

20 lines

load-lo16.ll

36 lines

memory-legalizer-load.ll

8 lines

memory-legalizer-store.ll

8 lines

memory_clause.ll

93 lines

mesa3d.ll

2 lines

mir-print-dead-csr-fi.mir

1 line

misched-killflags.mir

1 line

mubuf-offset-private.ll

38 lines

optimize-exec-masking-pre-ra.mir

1 line

partial-sgpr-to-vgpr-spills.ll

47 lines

pei-reg-scavenger-position.mir

16 lines

pei-scavenge-sgpr-carry-out.mir

57 lines

pei-scavenge-sgpr-gfx9.mir

5 lines

pei-scavenge-sgpr.mir

3 lines

private-access-no-objects.ll

14 lines

private-element-size.ll

224 lines

rename-independent-subregs-mac-operands.mir

2 lines

sched-assert-dead-def-subreg-use-other-subreg.mir

1 line

sched-handleMoveUp-subreg-def-across-subreg-def.mir

1 line

scratch-buffer.ll

14 lines

scratch-simple.ll

72 lines

sgpr-spill-wrong-stack-id.mir

25 lines

shl_add_ptr.ll

12 lines

si-spill-sgpr-stack.ll

3 lines

sibling-call.ll

2 lines

sp-too-many-input-sgprs.ll

spill-agpr.ll

16 lines

spill-before-exec.mir

11 lines

spill-empty-live-interval.mir

2 lines

spill-m0.ll

4 lines

spill-offset-calculation.ll

45 lines

stack-pointer-offset-relative-frameindex.ll

15 lines

stack-realign-kernel.ll

36 lines

stack-realign.ll

42 lines

stack-slot-color-sgpr-vgpr-spills.mir

7 lines

store-hi16.ll

28 lines

subreg-split-live-in-error.mir

1 line

subvector-test.mir

1 line

vgpr-spill-emergency-stack-slot.ll

4 lines

virtregrewrite-undef-identity-copy.mir

1 line

wqm.ll

4 lines

wwm-reserved.ll

8 lines

MIR/

AMDGPU/

machine-function-info-no-ir.mir

16 lines

machine-function-info.ll

mfi-parse-error-scratch-wave-offset-reg.mir

mfi-scratch-wave-offset-reg-class.mir

parse-order-reserved-regs.mir

12 lines

DebugInfo/

AMDGPU/

variable-locations.ll

2 lines

Diff 248352

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

Show First 20 Lines • Show All 704 Lines • ▼ Show 20 Lines	if (!IsEntryFunc) {
TLI.allocateSpecialInputVGPRs(CCInfo, MF, TRI, Info);		TLI.allocateSpecialInputVGPRs(CCInfo, MF, TRI, Info);
}		}

// Start adding system SGPRs.		// Start adding system SGPRs.
if (IsEntryFunc) {		if (IsEntryFunc) {
TLI.allocateSystemSGPRs(CCInfo, MF, *Info, CC, IsShader);		TLI.allocateSystemSGPRs(CCInfo, MF, *Info, CC, IsShader);
} else {		} else {
CCInfo.AllocateReg(Info->getScratchRSrcReg());		CCInfo.AllocateReg(Info->getScratchRSrcReg());
CCInfo.AllocateReg(Info->getScratchWaveOffsetReg());
CCInfo.AllocateReg(Info->getFrameOffsetReg());
TLI.allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);		TLI.allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);
}		}

// Move back to the end of the basic block.		// Move back to the end of the basic block.
B.setMBB(MBB);		B.setMBB(MBB);

return true;		return true;
}		}

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,472 Lines • ▼ Show 20 Lines
}		}

static bool isStackPtrRelative(const MachinePointerInfo &PtrInfo) {		static bool isStackPtrRelative(const MachinePointerInfo &PtrInfo) {
auto PSV = PtrInfo.V.dyn_cast<const PseudoSourceValue *>();		auto PSV = PtrInfo.V.dyn_cast<const PseudoSourceValue *>();
return PSV && PSV->isStack();		return PSV && PSV->isStack();
}		}

std::pair<SDValue, SDValue> AMDGPUDAGToDAGISel::foldFrameIndex(SDValue N) const {		std::pair<SDValue, SDValue> AMDGPUDAGToDAGISel::foldFrameIndex(SDValue N) const {
		SDLoc DL(N);
const MachineFunction &MF = CurDAG->getMachineFunction();		const MachineFunction &MF = CurDAG->getMachineFunction();
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();

if (auto FI = dyn_cast<FrameIndexSDNode>(N)) {		if (auto FI = dyn_cast<FrameIndexSDNode>(N)) {
SDValue TFI = CurDAG->getTargetFrameIndex(FI->getIndex(),		SDValue TFI = CurDAG->getTargetFrameIndex(FI->getIndex(),
FI->getValueType(0));		FI->getValueType(0));

// If we can resolve this to a frame index access, this will be relative to		// If we can resolve this to a frame index access, this will be relative to
// either the stack or frame pointer SGPR.		// either the stack or frame pointer SGPR, or 0 in a kernel.
return std::make_pair(		return std::make_pair(
TFI, CurDAG->getRegister(Info->getStackPtrOffsetReg(), MVT::i32));		TFI, Info->isEntryFunction()
		? CurDAG->getTargetConstant(0, DL, MVT::i32)
		: CurDAG->getRegister(Info->getStackPtrOffsetReg(), MVT::i32));
}		}

// If we don't know this private access is a local stack object, it needs to		// If we don't know this private access is a local stack object, it needs to
// be relative to the entry point's scratch wave offset register.		// be relative to the entry point's scratch wave offset.
return std::make_pair(N, CurDAG->getRegister(Info->getScratchWaveOffsetReg(),		return std::make_pair(N, CurDAG->getTargetConstant(0, DL, MVT::i32));
MVT::i32));
}		}

bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffen(SDNode *Parent,		bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffen(SDNode *Parent,
SDValue Addr, SDValue &Rsrc,		SDValue Addr, SDValue &Rsrc,
SDValue &VAddr, SDValue &SOffset,		SDValue &VAddr, SDValue &SOffset,
SDValue &ImmOffset) const {		SDValue &ImmOffset) const {

SDLoc DL(Addr);		SDLoc DL(Addr);
MachineFunction &MF = CurDAG->getMachineFunction();		MachineFunction &MF = CurDAG->getMachineFunction();
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();

Rsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);		Rsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);

if (ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr)) {		if (ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr)) {
unsigned Imm = CAddr->getZExtValue();		unsigned Imm = CAddr->getZExtValue();

SDValue HighBits = CurDAG->getTargetConstant(Imm & ~4095, DL, MVT::i32);		SDValue HighBits = CurDAG->getTargetConstant(Imm & ~4095, DL, MVT::i32);
MachineSDNode *MovHighBits = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,		MachineSDNode *MovHighBits = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,
DL, MVT::i32, HighBits);		DL, MVT::i32, HighBits);
VAddr = SDValue(MovHighBits, 0);		VAddr = SDValue(MovHighBits, 0);

// In a call sequence, stores to the argument stack area are relative to the		// In a call sequence, stores to the argument stack area are relative to the
// stack pointer.		// stack pointer.
const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();		const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();
unsigned SOffsetReg = isStackPtrRelative(PtrInfo) ?
Info->getStackPtrOffsetReg() : Info->getScratchWaveOffsetReg();

SOffset = CurDAG->getRegister(SOffsetReg, MVT::i32);		SOffset = isStackPtrRelative(PtrInfo)
		? CurDAG->getRegister(Info->getStackPtrOffsetReg(), MVT::i32)
		: CurDAG->getTargetConstant(0, DL, MVT::i32);
ImmOffset = CurDAG->getTargetConstant(Imm & 4095, DL, MVT::i16);		ImmOffset = CurDAG->getTargetConstant(Imm & 4095, DL, MVT::i16);
return true;		return true;
}		}

if (CurDAG->isBaseWithConstantOffset(Addr)) {		if (CurDAG->isBaseWithConstantOffset(Addr)) {
// (add n0, c1)		// (add n0, c1)

SDValue N0 = Addr.getOperand(0);		SDValue N0 = Addr.getOperand(0);
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffset(SDNode *Parent,

SDLoc DL(Addr);		SDLoc DL(Addr);
MachineFunction &MF = CurDAG->getMachineFunction();		MachineFunction &MF = CurDAG->getMachineFunction();
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();

SRsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);		SRsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);

const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();		const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();
unsigned SOffsetReg = isStackPtrRelative(PtrInfo) ?
Info->getStackPtrOffsetReg() : Info->getScratchWaveOffsetReg();

// FIXME: Get from MachinePointerInfo? We should only be using the frame		// FIXME: Get from MachinePointerInfo? We should only be using the frame
// offset if we know this is in a call sequence.		// offset if we know this is in a call sequence.
SOffset = CurDAG->getRegister(SOffsetReg, MVT::i32);		SOffset = isStackPtrRelative(PtrInfo)
		? CurDAG->getRegister(Info->getStackPtrOffsetReg(), MVT::i32)
		: CurDAG->getTargetConstant(0, DL, MVT::i32);

Offset = CurDAG->getTargetConstant(CAddr->getZExtValue(), DL, MVT::i16);		Offset = CurDAG->getTargetConstant(CAddr->getZExtValue(), DL, MVT::i16);
return true;		return true;
}		}

bool AMDGPUDAGToDAGISel::SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc,		bool AMDGPUDAGToDAGISel::SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc,
SDValue &SOffset, SDValue &Offset,		SDValue &SOffset, SDValue &Offset,
SDValue &GLC, SDValue &SLC,		SDValue &GLC, SDValue &SLC,
▲ Show 20 Lines • Show All 1,326 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 2,688 Lines • ▼ Show 20 Lines	return {{[=](MachineInstrBuilder &MIB) { // rsrc
},		},
[=](MachineInstrBuilder &MIB) { // vaddr		[=](MachineInstrBuilder &MIB) { // vaddr
MIB.addReg(HighBits);		MIB.addReg(HighBits);
},		},
[=](MachineInstrBuilder &MIB) { // soffset		[=](MachineInstrBuilder &MIB) { // soffset
const MachineMemOperand MMO = MI->memoperands_begin();		const MachineMemOperand MMO = MI->memoperands_begin();
const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();		const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();

Register SOffsetReg = isStackPtrRelative(PtrInfo)		if (isStackPtrRelative(PtrInfo))
? Info->getStackPtrOffsetReg()		MIB.addReg(Info->getStackPtrOffsetReg());
: Info->getScratchWaveOffsetReg();		else
MIB.addReg(SOffsetReg);		MIB.addImm(0);
},		},
[=](MachineInstrBuilder &MIB) { // offset		[=](MachineInstrBuilder &MIB) { // offset
MIB.addImm(Offset & 4095);		MIB.addImm(Offset & 4095);
}}};		}}};
}		}

assert(Offset == 0);		assert(Offset == 0);

Show All 20 Lines	if (isBaseWithConstantOffset(Root, *MRI)) {
Offset = PossibleOffset;		Offset = PossibleOffset;
}		}
}		}
} else if (RootDef->getOpcode() == AMDGPU::G_FRAME_INDEX) {		} else if (RootDef->getOpcode() == AMDGPU::G_FRAME_INDEX) {
FI = RootDef->getOperand(1).getIndex();		FI = RootDef->getOperand(1).getIndex();
}		}
}		}

// If we don't know this private access is a local stack object, it needs to
// be relative to the entry point's scratch wave offset register.
// TODO: Should split large offsets that don't fit like above.
// TODO: Don't use scratch wave offset just because the offset didn't fit.
Register SOffset = FI.hasValue() ? Info->getStackPtrOffsetReg()
: Info->getScratchWaveOffsetReg();

return {{[=](MachineInstrBuilder &MIB) { // rsrc		return {{[=](MachineInstrBuilder &MIB) { // rsrc
MIB.addReg(Info->getScratchRSrcReg());		MIB.addReg(Info->getScratchRSrcReg());
},		},
[=](MachineInstrBuilder &MIB) { // vaddr		[=](MachineInstrBuilder &MIB) { // vaddr
if (FI.hasValue())		if (FI.hasValue())
MIB.addFrameIndex(FI.getValue());		MIB.addFrameIndex(FI.getValue());
else		else
MIB.addReg(VAddr);		MIB.addReg(VAddr);
},		},
[=](MachineInstrBuilder &MIB) { // soffset		[=](MachineInstrBuilder &MIB) { // soffset
MIB.addReg(SOffset);		// If we don't know this private access is a local stack object, it
		// needs to be relative to the entry point's scratch wave offset.
		// TODO: Should split large offsets that don't fit like above.
		// TODO: Don't use scratch wave offset just because the offset
		// didn't fit.
		if (!Info->isEntryFunction() && FI.hasValue())
		MIB.addReg(Info->getStackPtrOffsetReg());
		else
		MIB.addImm(0);
},		},
[=](MachineInstrBuilder &MIB) { // offset		[=](MachineInstrBuilder &MIB) { // offset
MIB.addImm(Offset);		MIB.addImm(Offset);
}}};		}}};
}		}

bool AMDGPUInstructionSelector::isDSOffsetLegal(Register Base,		bool AMDGPUInstructionSelector::isDSOffsetLegal(Register Base,
int64_t Offset,		int64_t Offset,
Show All 21 Lines	if (!mi_match(Root.getReg(), *MRI, m_ICst(Offset)) \|\|
!SIInstrInfo::isLegalMUBUFImmOffset(Offset))		!SIInstrInfo::isLegalMUBUFImmOffset(Offset))
return {};		return {};

const MachineFunction *MF = MBB->getParent();		const MachineFunction *MF = MBB->getParent();
const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();
const MachineMemOperand MMO = MI->memoperands_begin();		const MachineMemOperand MMO = MI->memoperands_begin();
const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();		const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();

Register SOffsetReg = isStackPtrRelative(PtrInfo)
? Info->getStackPtrOffsetReg()
: Info->getScratchWaveOffsetReg();
return {{		return {{
[=](MachineInstrBuilder &MIB) {		[=](MachineInstrBuilder &MIB) { // rsrc
MIB.addReg(Info->getScratchRSrcReg());		MIB.addReg(Info->getScratchRSrcReg());
}, // rsrc		},
[=](MachineInstrBuilder &MIB) { MIB.addReg(SOffsetReg); }, // soffset		[=](MachineInstrBuilder &MIB) { // soffset
		if (isStackPtrRelative(PtrInfo))
		MIB.addReg(Info->getStackPtrOffsetReg());
		else
		MIB.addImm(0);
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Offset); } // offset		[=](MachineInstrBuilder &MIB) { MIB.addImm(Offset); } // offset
}};		}};
}		}

std::pair<Register, unsigned>		std::pair<Register, unsigned>
AMDGPUInstructionSelector::selectDS1Addr1OffsetImpl(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectDS1Addr1OffsetImpl(MachineOperand &Root) const {
const MachineInstr *RootDef = MRI->getVRegDef(Root.getReg());		const MachineInstr *RootDef = MRI->getVRegDef(Root.getReg());
if (!RootDef)		if (!RootDef)
return std::make_pair(Root.getReg(), 0);		return std::make_pair(Root.getReg(), 0);
▲ Show 20 Lines • Show All 542 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 1,059 Lines • ▼ Show 20 Lines	Error = SMDiagnostic(*PFS.SM, SMLoc(), Buffer.getBufferIdentifier(), 1,
RegName.Value.size(), SourceMgr::DK_Error,		RegName.Value.size(), SourceMgr::DK_Error,
"incorrect register class for field", RegName.Value,		"incorrect register class for field", RegName.Value,
None, None);		None, None);
SourceRange = RegName.SourceRange;		SourceRange = RegName.SourceRange;
return true;		return true;
};		};

if (parseRegister(YamlMFI.ScratchRSrcReg, MFI->ScratchRSrcReg) \|\|		if (parseRegister(YamlMFI.ScratchRSrcReg, MFI->ScratchRSrcReg) \|\|
parseRegister(YamlMFI.ScratchWaveOffsetReg, MFI->ScratchWaveOffsetReg) \|\|
parseRegister(YamlMFI.FrameOffsetReg, MFI->FrameOffsetReg) \|\|		parseRegister(YamlMFI.FrameOffsetReg, MFI->FrameOffsetReg) \|\|
parseRegister(YamlMFI.StackPtrOffsetReg, MFI->StackPtrOffsetReg))		parseRegister(YamlMFI.StackPtrOffsetReg, MFI->StackPtrOffsetReg))
return true;		return true;

if (MFI->ScratchRSrcReg != AMDGPU::PRIVATE_RSRC_REG &&		if (MFI->ScratchRSrcReg != AMDGPU::PRIVATE_RSRC_REG &&
!AMDGPU::SGPR_128RegClass.contains(MFI->ScratchRSrcReg)) {		!AMDGPU::SGPR_128RegClass.contains(MFI->ScratchRSrcReg)) {
return diagnoseRegisterClass(YamlMFI.ScratchRSrcReg);		return diagnoseRegisterClass(YamlMFI.ScratchRSrcReg);
}		}

if (MFI->ScratchWaveOffsetReg != AMDGPU::SCRATCH_WAVE_OFFSET_REG &&
!AMDGPU::SGPR_32RegClass.contains(MFI->ScratchWaveOffsetReg)) {
return diagnoseRegisterClass(YamlMFI.ScratchWaveOffsetReg);
}

if (MFI->FrameOffsetReg != AMDGPU::FP_REG &&		if (MFI->FrameOffsetReg != AMDGPU::FP_REG &&
!AMDGPU::SGPR_32RegClass.contains(MFI->FrameOffsetReg)) {		!AMDGPU::SGPR_32RegClass.contains(MFI->FrameOffsetReg)) {
return diagnoseRegisterClass(YamlMFI.FrameOffsetReg);		return diagnoseRegisterClass(YamlMFI.FrameOffsetReg);
}		}

if (MFI->StackPtrOffsetReg != AMDGPU::SP_REG &&		if (MFI->StackPtrOffsetReg != AMDGPU::SP_REG &&
!AMDGPU::SGPR_32RegClass.contains(MFI->StackPtrOffsetReg)) {		!AMDGPU::SGPR_32RegClass.contains(MFI->StackPtrOffsetReg)) {
return diagnoseRegisterClass(YamlMFI.StackPtrOffsetReg);		return diagnoseRegisterClass(YamlMFI.StackPtrOffsetReg);
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

	Show First 20 Lines • Show All 290 Lines • ▼ Show 20 Lines
	}			}

	void AMDGPUInstPrinter::printRegOperand(unsigned RegNo, raw_ostream &O,			void AMDGPUInstPrinter::printRegOperand(unsigned RegNo, raw_ostream &O,
	const MCRegisterInfo &MRI) {			const MCRegisterInfo &MRI) {
	#if !defined(NDEBUG)			#if !defined(NDEBUG)
	switch (RegNo) {			switch (RegNo) {
	case AMDGPU::FP_REG:			case AMDGPU::FP_REG:
	case AMDGPU::SP_REG:			case AMDGPU::SP_REG:
	case AMDGPU::SCRATCH_WAVE_OFFSET_REG:
	case AMDGPU::PRIVATE_RSRC_REG:			case AMDGPU::PRIVATE_RSRC_REG:
	llvm_unreachable("pseudo-register should not ever be emitted");			llvm_unreachable("pseudo-register should not ever be emitted");
	case AMDGPU::SCC:			case AMDGPU::SCC:
	llvm_unreachable("pseudo scc should not ever be emitted");			llvm_unreachable("pseudo scc should not ever be emitted");
	default:			default:
	break;			break;
	}			}
	#endif			#endif
	▲ Show 20 Lines • Show All 1,251 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 604 Lines • ▼ Show 20 Lines	if (UseMI->isRegSequence()) {

return;		return;
}		}

if (tryToFoldACImm(TII, OpToFold, UseMI, UseOpIdx, FoldList))		if (tryToFoldACImm(TII, OpToFold, UseMI, UseOpIdx, FoldList))
return;		return;

if (frameIndexMayFold(TII, *UseMI, UseOpIdx, OpToFold)) {		if (frameIndexMayFold(TII, *UseMI, UseOpIdx, OpToFold)) {
// Sanity check that this is a stack access.		// Sanity check that this is a stack access. For both kernels and
		// non-kernel functions this means the SRSRC is the stack SRSRC. For
		// kernels the SOffset is always 0 because the scratch wave offset is
		// already included in the scratch SRSRC, so there is no SP/FP. For
		// non-kernel functions SOffset is either the StackPtrOffsetReg or 0 (in
		// which case we must update it to the ScratchPtrOffsetReg when folding).
// FIXME: Should probably use stack pseudos before frame lowering.		// FIXME: Should probably use stack pseudos before frame lowering.
MachineOperand SOff = TII->getNamedOperand(UseMI, AMDGPU::OpName::soffset);
if (!SOff->isReg() \|\| (SOff->getReg() != MFI->getScratchWaveOffsetReg() &&
SOff->getReg() != MFI->getStackPtrOffsetReg()))
return;

if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() !=		if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() !=
MFI->getScratchRSrcReg())		MFI->getScratchRSrcReg())
return;		return;

		MachineOperand &SOff =
		TII->getNamedOperand(UseMI, AMDGPU::OpName::soffset);
		if (MFI->isEntryFunction()) {
		if (!SOff.isImm() \|\| SOff.getImm() != 0)
		return;
		} else {
		if (!((SOff.isReg() && SOff.getReg() == MFI->getStackPtrOffsetReg()) \|\|
		(SOff.isImm() && SOff.getImm() == 0)))
		arsenmUnsubmitted Done Reply Inline Actions Should demorgan this arsenm: Should demorgan this
		return;
		}

// A frame index will resolve to a positive constant, so it should always be		// A frame index will resolve to a positive constant, so it should always be
// safe to fold the addressing mode, even pre-GFX9.		// safe to fold the addressing mode, even pre-GFX9.
UseMI->getOperand(UseOpIdx).ChangeToFrameIndex(OpToFold.getIndex());		UseMI->getOperand(UseOpIdx).ChangeToFrameIndex(OpToFold.getIndex());
SOff->setReg(MFI->getStackPtrOffsetReg());
		if (!MFI->isEntryFunction() && SOff.isImm())
		SOff.ChangeToRegister(MFI->getStackPtrOffsetReg(), false);
return;		return;
}		}

bool FoldingImmLike =		bool FoldingImmLike =
OpToFold.isImm() \|\| OpToFold.isFI() \|\| OpToFold.isGlobal();		OpToFold.isImm() \|\| OpToFold.isFI() \|\| OpToFold.isGlobal();

if (FoldingImmLike && UseMI->isCopy()) {		if (FoldingImmLike && UseMI->isCopy()) {
Register DestReg = UseMI->getOperand(0).getReg();		Register DestReg = UseMI->getOperand(0).getReg();
▲ Show 20 Lines • Show All 911 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFrameLowering.h

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	private:			private:
	void emitEntryFunctionFlatScratchInit(MachineFunction &MF,			void emitEntryFunctionFlatScratchInit(MachineFunction &MF,
	MachineBasicBlock &MBB,			MachineBasicBlock &MBB,
	MachineBasicBlock::iterator I,			MachineBasicBlock::iterator I,
	const DebugLoc &DL,			const DebugLoc &DL,
	Register ScratchWaveOffsetReg) const;			Register ScratchWaveOffsetReg) const;

	Register getEntryFunctionReservedScratchRsrcReg(MachineFunction &MF) const;

	Register			Register
	getEntryFunctionReservedScratchWaveOffsetReg(MachineFunction &MF) const;			getEntryFunctionReservedScratchRsrcReg(MachineFunction &MF,
				Register ScratchWaveOffsetReg) const;

	void emitEntryFunctionScratchRsrcRegSetup(MachineFunction &MF,			void emitEntryFunctionScratchRsrcRegSetup(
	MachineBasicBlock &MBB,			MachineFunction &MF, MachineBasicBlock &MBB,
	MachineBasicBlock::iterator I,			MachineBasicBlock::iterator I, const DebugLoc &DL,
	const DebugLoc &DL,			Register PreloadedPrivateBufferReg, Register ScratchRsrcReg,
	Register PreloadedPrivateBufferReg,			Register ScratchWaveOffsetReg) const;
	Register ScratchRsrcReg) const;

	public:			public:
	bool hasFP(const MachineFunction &MF) const override;			bool hasFP(const MachineFunction &MF) const override;
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_LIB_TARGET_AMDGPU_SIFRAMELOWERING_H			#endif // LLVM_LIB_TARGET_AMDGPU_SIFRAMELOWERING_H

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show All 24 Lines


static ArrayRef<MCPhysReg> getAllSGPR128(const GCNSubtarget &ST,		static ArrayRef<MCPhysReg> getAllSGPR128(const GCNSubtarget &ST,
const MachineFunction &MF) {		const MachineFunction &MF) {
return makeArrayRef(AMDGPU::SGPR_128RegClass.begin(),		return makeArrayRef(AMDGPU::SGPR_128RegClass.begin(),
ST.getMaxNumSGPRs(MF) / 4);		ST.getMaxNumSGPRs(MF) / 4);
}		}

static ArrayRef<MCPhysReg> getAllSGPRs(const GCNSubtarget &ST,
const MachineFunction &MF) {
return makeArrayRef(AMDGPU::SGPR_32RegClass.begin(),
ST.getMaxNumSGPRs(MF));
}

// Find a scratch register that we can use at the start of the prologue to		// Find a scratch register that we can use at the start of the prologue to
// re-align the stack pointer. We avoid using callee-save registers since they		// re-align the stack pointer. We avoid using callee-save registers since they
// may appear to be free when this is called from canUseAsPrologue (during		// may appear to be free when this is called from canUseAsPrologue (during
// shrink wrapping), but then no longer be free when this is called from		// shrink wrapping), but then no longer be free when this is called from
// emitPrologue.		// emitPrologue.
//		//
// FIXME: This is a bit conservative, since in the above case we could use one		// FIXME: This is a bit conservative, since in the above case we could use one
// of the callee-save registers as a scratch temp to re-align the stack pointer,		// of the callee-save registers as a scratch temp to re-align the stack pointer,
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	void SIFrameLowering::emitEntryFunctionFlatScratchInit(
// Convert offset to 256-byte units.		// Convert offset to 256-byte units.
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_LSHR_B32), AMDGPU::FLAT_SCR_HI)		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_LSHR_B32), AMDGPU::FLAT_SCR_HI)
.addReg(FlatScrInitLo, RegState::Kill)		.addReg(FlatScrInitLo, RegState::Kill)
.addImm(8);		.addImm(8);
}		}

// Shift down registers reserved for the scratch RSRC.		// Shift down registers reserved for the scratch RSRC.
Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(		Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(
MachineFunction &MF) const {		MachineFunction &MF, Register ScratchWaveOffsetReg) const {

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();		const SIRegisterInfo *TRI = &TII->getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

assert(MFI->isEntryFunction());		assert(MFI->isEntryFunction());
Show All 12 Lines	Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(
// which were actually used.		// which were actually used.
//		//
// FIXME: It might be safer to use a pseudoregister before replacement.		// FIXME: It might be safer to use a pseudoregister before replacement.

// FIXME: We should be able to eliminate unused input registers. We only		// FIXME: We should be able to eliminate unused input registers. We only
// cannot do this for the resources required for scratch access. For now we		// cannot do this for the resources required for scratch access. For now we
// skip over user SGPRs and may leave unused holes.		// skip over user SGPRs and may leave unused holes.

// We find the resource first because it has an alignment requirement.		unsigned NumPreloadedSGPRs = MFI->getNumPreloadedSGPRs();
		// FIXME: This is just lifted from AMDGPUAsmPrinter, because I'm not
		scott.linderAuthorUnsubmitted Done Reply Inline Actions @arsenm @nhaehnle I don't think I understand how `inreg` currently works relative to "preloaded" SGPRs; is/should `inreg` be recorded somewhere in the machine function info so this isn't necessary? scott.linder: @arsenm @nhaehnle I don't think I understand how `inreg` currently works relative to…
		arsenmUnsubmitted Not Done Reply Inline Actions Not directly. There shouldn't be any repeating of the calling convention logic here. Either the number of SGPR arguments should be recorded, or it should be inferred from the machine code. It might be correct to just count the number of SGPR in the function live-in list. I think live in registers can be deleted from the list if they are proven to be unused, so this might be fragile. Finding the highest live in SGPR number may also work. arsenm: Not directly. There shouldn't be any repeating of the calling convention logic here. Either the…
		scott.linderAuthorUnsubmitted Done Reply Inline Actions In switching back to supporting an FP I no longer see the need for this manifest, but there may still be a need to update this in the future. I don't think my change is making this any more fragile so I'm leaving it as it was. scott.linder: In switching back to supporting an FP I no longer see the need for this manifest, but there may…
unsigned NumPreloaded = (MFI->getNumPreloadedSGPRs() + 3) / 4;		// sure where/if we track InReg SGPR arguments otherwise.
		for (auto &Arg : MF.getFunction().args()) {
		arsenmUnsubmitted Not Done Reply Inline Actions This should not need to inspect the original IR. Why can't this just read it directly from MFI? They should be accounted there already? arsenm: This should not need to inspect the original IR. Why can't this just read it directly from MFI?
		unsigned NumRegs = (Arg.getType()->getPrimitiveSizeInBits() + 31) / 32;
		arsenmUnsubmitted Not Done Reply Inline Actions This will be inaccurate for any struct type, this should have been computed during lowering that knows the type split arsenm: This will be inaccurate for any struct type, this should have been computed during lowering…
		if (Arg.hasAttribute(Attribute::InReg)) {
		NumPreloadedSGPRs += NumRegs;
		}
		}
ArrayRef<MCPhysReg> AllSGPR128s = getAllSGPR128(ST, MF);		ArrayRef<MCPhysReg> AllSGPR128s = getAllSGPR128(ST, MF);
AllSGPR128s = AllSGPR128s.slice(std::min(static_cast<unsigned>(AllSGPR128s.size()), NumPreloaded));		AllSGPR128s = AllSGPR128s.slice(std::min(
		static_cast<unsigned>(AllSGPR128s.size()), (NumPreloadedSGPRs + 3) / 4));

// Skip the last N reserved elements because they should have already been		// Skip the last N reserved elements because they should have already been
// reserved for VCC etc.		// reserved for VCC etc.
for (MCPhysReg Reg : AllSGPR128s) {		for (MCPhysReg Reg : AllSGPR128s) {
// Pick the first unallocated one. Make sure we don't clobber the other		// Pick the first unallocated one. Make sure we don't clobber the other
// reserved input we needed.		// reserved input we needed.
if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg)) {		//
		// FIXME: The preloaded SGPR count doesn't seem to be completely accurate,
		// SITargetLowering::allocateSystemSGPRs just picks the next free SGPR for
		scott.linderAuthorUnsubmitted Done Reply Inline Actions Similar question here, should there be a change in `SITargetLowering` so the preloaded count is correct? scott.linder: Similar question here, should there be a change in `SITargetLowering` so the preloaded count is…
		// the scratch wave offset. To work around this we ask the caller for the
		// scratch wave offset and explicitly avoid it.
		if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg) &&
		!TRI->isSubRegisterEq(Reg, ScratchWaveOffsetReg)) {
MRI.replaceRegWith(ScratchRsrcReg, Reg);		MRI.replaceRegWith(ScratchRsrcReg, Reg);
MFI->setScratchRSrcReg(Reg);		MFI->setScratchRSrcReg(Reg);
return Reg;		return Reg;
}		}
}		}

return ScratchRsrcReg;		return ScratchRsrcReg;
}		}

// Shift down registers reserved for the scratch wave offset.
Register SIFrameLowering::getEntryFunctionReservedScratchWaveOffsetReg(
MachineFunction &MF) const {

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

assert(MFI->isEntryFunction());

Register ScratchWaveOffsetReg = MFI->getScratchWaveOffsetReg();

if (ScratchWaveOffsetReg == AMDGPU::NoRegister \|\|
(!MRI.isPhysRegUsed(ScratchWaveOffsetReg) && !hasFP(MF) &&
!MFI->hasFlatScratchInit())) {
assert(!hasFP(MF) && !MFI->hasFlatScratchInit());
return AMDGPU::NoRegister;
}

if (ST.hasSGPRInitBug() \|\|
ScratchWaveOffsetReg != TRI->reservedPrivateSegmentWaveByteOffsetReg(MF))
return ScratchWaveOffsetReg;

unsigned NumPreloaded = MFI->getNumPreloadedSGPRs();

ArrayRef<MCPhysReg> AllSGPRs = getAllSGPRs(ST, MF);
if (NumPreloaded > AllSGPRs.size())
return ScratchWaveOffsetReg;

AllSGPRs = AllSGPRs.slice(NumPreloaded);

// We need to drop register from the end of the list that we cannot use
// for the scratch wave offset.
// + 2 s102 and s103 do not exist on VI.
// + 2 for vcc
// + 2 for xnack_mask
// + 2 for flat_scratch
// + 4 for registers reserved for scratch resource register
// + 1 for register reserved for scratch wave offset. (By exluding this
// register from the list to consider, it means that when this
// register is being used for the scratch wave offset and there
// are no other free SGPRs, then the value will stay in this register.
// + 1 if stack pointer is used.
// ----
// 13 (+1)
unsigned ReservedRegCount = 13;

if (AllSGPRs.size() < ReservedRegCount)
return ScratchWaveOffsetReg;

for (MCPhysReg Reg : AllSGPRs.drop_back(ReservedRegCount)) {
// Pick the first unallocated SGPR. Be careful not to pick an alias of the
// scratch descriptor, since we haven’t added its uses yet.
if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg)) {
MRI.replaceRegWith(ScratchWaveOffsetReg, Reg);
if (MFI->getScratchWaveOffsetReg() == MFI->getStackPtrOffsetReg()) {
assert(!hasFP(MF));
MFI->setStackPtrOffsetReg(Reg);
}
MFI->setScratchWaveOffsetReg(Reg);
MFI->setFrameOffsetReg(Reg);
return Reg;
}
}

return ScratchWaveOffsetReg;
}

void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,		void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");		assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");

// FIXME: If we only have SGPR spills, we won't actually be using scratch		// FIXME: If we only have SGPR spills, we won't actually be using scratch
// memory since these spill to VGPRs. We should be cleaning up these unused		// memory since these spill to VGPRs. We should be cleaning up these unused
// SGPR spill frame indices somewhere.		// SGPR spill frame indices somewhere.

// FIXME: We still have implicit uses on SGPR spill instructions in case they		// FIXME: We still have implicit uses on SGPR spill instructions in case they
// need to spill to vector memory. It's likely that will not happen, but at		// need to spill to vector memory. It's likely that will not happen, but at
// this point it appears we need the setup. This part of the prolog should be		// this point it appears we need the setup. This part of the prolog should be
// emitted after frame indices are eliminated.		// emitted after frame indices are eliminated.

// FIXME: Remove all of the isPhysRegUsed checks		// FIXME: Remove all of the isPhysRegUsed checks

SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();

assert(MFI->isEntryFunction());		assert(MFI->isEntryFunction());

// We need to do the replacement of the private segment buffer and wave offset		Register ScratchWaveOffsetReg = MFI->getPreloadedReg(
// register even if there are no stack objects. There could be stores to undef
// or a constant without an associated object.
//
// These calls will return `AMDGPU::NoRegister` in cases where there are no
// actual uses of the respective registers.
Register ScratchRsrcReg = getEntryFunctionReservedScratchRsrcReg(MF);
Register ScratchWaveOffsetReg =
getEntryFunctionReservedScratchWaveOffsetReg(MF);

// Make the selected registers live throughout the function.
for (MachineBasicBlock &OtherBB : MF) {
if (&OtherBB == &MBB)
continue;

if (ScratchWaveOffsetReg != AMDGPU::NoRegister)
OtherBB.addLiveIn(ScratchWaveOffsetReg);

if (ScratchRsrcReg != AMDGPU::NoRegister)
OtherBB.addLiveIn(ScratchRsrcReg);
}

// Now that we have fixed the reserved registers we need to locate the
// (potentially) preloaded registers. We should always have a preloaded
// scratch wave offset register, but we only have a preloaded scratch rsrc
// register for HSA.
Register PreloadedScratchWaveOffsetReg = MFI->getPreloadedReg(
AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET);		AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET);
// FIXME: Hack to not crash in situations which emitted an error.		// FIXME: Hack to not crash in situations which emitted an error.
if (PreloadedScratchWaveOffsetReg == AMDGPU::NoRegister)		if (ScratchWaveOffsetReg == AMDGPU::NoRegister)
return;		return;

// We added live-ins during argument lowering, but since they were not used		// We need to do the replacement of the private segment buffer register even
// they were deleted. We're adding the uses now, so add them back.		// if there are no stack objects. There could be stores to undef or a
MRI.addLiveIn(PreloadedScratchWaveOffsetReg);		// constant without an associated object.
MBB.addLiveIn(PreloadedScratchWaveOffsetReg);		//
		// This will return `AMDGPU::NoRegister` in cases where there are no actual
		// uses of the SRSRC.
		Register ScratchRsrcReg =
		getEntryFunctionReservedScratchRsrcReg(MF, ScratchWaveOffsetReg);

		// Make the selected register live throughout the function.
		if (ScratchRsrcReg != AMDGPU::NoRegister)
		for (MachineBasicBlock &OtherBB : MF)
		if (&OtherBB != &MBB)
		OtherBB.addLiveIn(ScratchRsrcReg);
		arsenmUnsubmitted Done Reply Inline Actions Braces arsenm: Braces

		// Now that we have fixed the reserved SRSRC we need to locate the
		// (potentially) preloaded SRSRC.
Register PreloadedScratchRsrcReg = AMDGPU::NoRegister;		Register PreloadedScratchRsrcReg = AMDGPU::NoRegister;
if (ST.isAmdHsaOrMesa(F)) {		if (ST.isAmdHsaOrMesa(F)) {
PreloadedScratchRsrcReg =		PreloadedScratchRsrcReg =
MFI->getPreloadedReg(AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_BUFFER);		MFI->getPreloadedReg(AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_BUFFER);
if (ScratchRsrcReg != AMDGPU::NoRegister &&		if (ScratchRsrcReg != AMDGPU::NoRegister &&
PreloadedScratchRsrcReg != AMDGPU::NoRegister) {		PreloadedScratchRsrcReg != AMDGPU::NoRegister) {
		// We added live-ins during argument lowering, but since they were not
		// used they were deleted. We're adding the uses now, so add them back.
MRI.addLiveIn(PreloadedScratchRsrcReg);		MRI.addLiveIn(PreloadedScratchRsrcReg);
MBB.addLiveIn(PreloadedScratchRsrcReg);		MBB.addLiveIn(PreloadedScratchRsrcReg);
}		}
}		}

		// Debug location must be unknown since the first debug location is used to
		// determine the end of the prologue.
DebugLoc DL;		DebugLoc DL;
MachineBasicBlock::iterator I = MBB.begin();		MachineBasicBlock::iterator I = MBB.begin();

const bool HasFP = hasFP(MF);		if (MF.getFrameInfo().hasCalls()) {
		unsigned SPReg = MFI->getStackPtrOffsetReg();
		arsenmUnsubmitted Done Reply Inline Actions s/unsigned/Register arsenm: s/unsigned/Register
// If we are not HSA or we happened to reserved the original input registers,		assert(SPReg != AMDGPU::SP_REG);
// we don't need to copy to the reserved registers.		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), SPReg)
const bool CopyBuffer = ST.isAmdHsaOrMesa(F) &&		.addImm(MF.getFrameInfo().getStackSize() * ST.getWavefrontSize());
ScratchRsrcReg != AMDGPU::NoRegister &&
PreloadedScratchRsrcReg != AMDGPU::NoRegister &&
ScratchRsrcReg != PreloadedScratchRsrcReg;

// This needs to be careful of the copying order to avoid overwriting one of
// the input registers before it's been copied to it's final
// destination. Usually the offset should be copied first.
const bool CopyBufferFirst =
TRI->isSubRegisterEq(PreloadedScratchRsrcReg, ScratchWaveOffsetReg);

if (CopyBuffer && CopyBufferFirst) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), ScratchRsrcReg)
.addReg(PreloadedScratchRsrcReg, RegState::Kill);
}		}

if (ScratchWaveOffsetReg != AMDGPU::NoRegister) {		if (MFI->hasFlatScratchInit() \|\| ScratchRsrcReg != AMDGPU::NoRegister) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), ScratchWaveOffsetReg)		MRI.addLiveIn(ScratchWaveOffsetReg);
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
.addReg(PreloadedScratchWaveOffsetReg, HasFP ? RegState::Kill : 0);		MBB.addLiveIn(ScratchWaveOffsetReg);
}		}

if (CopyBuffer && !CopyBufferFirst) {		if (MFI->hasFlatScratchInit()) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), ScratchRsrcReg)		emitEntryFunctionFlatScratchInit(MF, MBB, I, DL, ScratchWaveOffsetReg);
.addReg(PreloadedScratchRsrcReg, RegState::Kill);
}		}

// FIXME: This should also implement the setup path for HSA.
if (ScratchRsrcReg != AMDGPU::NoRegister) {		if (ScratchRsrcReg != AMDGPU::NoRegister) {
emitEntryFunctionScratchRsrcRegSetup(		emitEntryFunctionScratchRsrcRegSetup(MF, MBB, I, DL,
MF, MBB, I, DL, PreloadedScratchRsrcReg, ScratchRsrcReg);		PreloadedScratchRsrcReg,
}		ScratchRsrcReg, ScratchWaveOffsetReg);

if (HasFP) {
const MachineFrameInfo &FrameInfo = MF.getFrameInfo();
int64_t StackSize = FrameInfo.getStackSize();

Register SPReg = MFI->getStackPtrOffsetReg();
assert(SPReg != AMDGPU::SP_REG);

// On kernel entry, the private scratch wave offset is the SP value.
if (StackSize == 0) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), SPReg)
.addReg(MFI->getScratchWaveOffsetReg());
} else {
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADD_U32), SPReg)
.addReg(MFI->getScratchWaveOffsetReg())
.addImm(StackSize * ST.getWavefrontSize());
}		}
}		}

if (MFI->hasFlatScratchInit()) {		// Emit scratch RSRC setup code, assuming `ScratchRsrcReg != AMDGPU::NoReg`
emitEntryFunctionFlatScratchInit(MF, MBB, I, DL,
MFI->getScratchWaveOffsetReg());
}
}

// Emit scratch RSRC setup code, assuming `ScratchRsrcReg != AMDGPU::NoRegister`
void SIFrameLowering::emitEntryFunctionScratchRsrcRegSetup(		void SIFrameLowering::emitEntryFunctionScratchRsrcRegSetup(
MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I,		MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
const DebugLoc &DL, Register PreloadedScratchRsrcReg,		const DebugLoc &DL, Register PreloadedScratchRsrcReg,
Register ScratchRsrcReg) const {		Register ScratchRsrcReg, Register ScratchWaveOffsetReg) const {

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();		const SIRegisterInfo *TRI = &TII->getRegisterInfo();
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
const Function &Fn = MF.getFunction();		const Function &Fn = MF.getFunction();

if (ST.isAmdPalOS()) {		if (ST.isAmdPalOS()) {
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (ST.isAmdPalOS()) {

BuildMI(MBB, I, DL, SMovB32, Rsrc2)		BuildMI(MBB, I, DL, SMovB32, Rsrc2)
.addImm(Rsrc23 & 0xffffffff)		.addImm(Rsrc23 & 0xffffffff)
.addReg(ScratchRsrcReg, RegState::ImplicitDefine);		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);

BuildMI(MBB, I, DL, SMovB32, Rsrc3)		BuildMI(MBB, I, DL, SMovB32, Rsrc3)
.addImm(Rsrc23 >> 32)		.addImm(Rsrc23 >> 32)
.addReg(ScratchRsrcReg, RegState::ImplicitDefine);		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		} else if (ST.isAmdHsaOrMesa(Fn)) {
		assert(PreloadedScratchRsrcReg != AMDGPU::NoRegister);

		if (ScratchRsrcReg != PreloadedScratchRsrcReg)
		arsenmUnsubmitted Done Reply Inline Actions Braces arsenm: Braces
		BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), ScratchRsrcReg)
		.addReg(PreloadedScratchRsrcReg, RegState::Kill);
}		}

		// Add the scratch wave offset into the scratch RSRC.
		//
		// We only want to update the first 48 bits, which is the base address
		// pointer, without touching the adjacent 16 bits of flags. We know this add
		// cannot carry-out from bit 47, otherwise the scratch allocation would be
		// impossible to fit in the 48-bit global address space.
		//
		// TODO: Evaluate if it is better to just construct an SRD using the flat
		// scratch init and some constants rather than update the one we are passed.
		Register ScratchRsrcSub0 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub0);
		arsenmUnsubmitted Not Done Reply Inline Actions Do we actually need these bits? I'm fairly confident these are always 0 in the HSA resource descriptor (or at least are a known constant we can just reproduce later) arsenm: Do we actually need these bits? I'm fairly confident these are always 0 in the HSA resource…
		arsenmUnsubmitted Not Done Reply Inline Actions According to this it's hardcoded: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/core/runtime/amd_aql_queue.cpp#L1015 We just need to worry about SWIZZLE_ENABLE being set to 1. This is the high bit, so all it can do is trigger a carry on the second add. So I think that means you can get away with just doing the add, and then using s_bitset1_b32 to ensure it wasn't carried away arsenm: According to this it's hardcoded: https://github.com/RadeonOpenCompute/ROCR…
		arsenmUnsubmitted Not Done Reply Inline Actions Actually, I don't think any add that fits in the 48-bit address space should ever touch the high bits (although I usually manage to be wrong about known bits optimizations with adds) arsenm: Actually, I don't think any add that fits in the 48-bit address space should ever touch the…
		arsenmUnsubmitted Not Done Reply Inline Actions I think this means it's OK to just not worry about the high bits: https://rise4fun.com/Alive/i24 arsenm: I think this means it's OK to just not worry about the high bits: https://rise4fun.
		arsenmUnsubmitted Not Done Reply Inline Actions As long as we know bit 48 is 0, this seems fine. As this is hardcoded in the driver, this is probably OK https://rise4fun.com/Alive/KmH arsenm: As long as we know bit 48 is 0, this seems fine. As this is hardcoded in the driver, this is…
		scott.linderAuthorUnsubmitted Done Reply Inline Actions That make sense to me, and this would simplify things a lot. I don't quite understand if we need to ensure [48:62] are 0, though? If the addc carries into bit 48 is that an issue? I.e. https://rise4fun.com/Alive/qsv At the very least, it seems like we can avoid the need to save anything and just mask in a constant, but if it is possible to avoid that too it removes a couple additional instructions from nearly every kernel prologue. scott.linder: That make sense to me, and this would simplify things a lot. I don't quite understand if we…
		scott.linderAuthorUnsubmitted Done Reply Inline Actions I went the route of just always doing the 64-bit add of the scratch wave offset into the SRsrc rather than saving anything or using known constants for some of the bits. From some other discussion this should always be correct. scott.linder: I went the route of just always doing the 64-bit add of the scratch wave offset into the SRsrc…
		Register ScratchRsrcSub1 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub1);

		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADD_U32), ScratchRsrcSub0)
		arsenmUnsubmitted Not Done Reply Inline Actions I think just 0xffff0000 would be clearer here arsenm: I think just 0xffff0000 would be clearer here
		.addReg(ScratchRsrcSub0)
		.addReg(ScratchWaveOffsetReg, RegState::Kill)
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADDC_U32), ScratchRsrcSub1)
		.addReg(ScratchRsrcSub1)
		.addImm(0)
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
}		}

bool SIFrameLowering::isSupportedStackID(TargetStackID::Value ID) const {		bool SIFrameLowering::isSupportedStackID(TargetStackID::Value ID) const {
switch (ID) {		switch (ID) {
case TargetStackID::Default:		case TargetStackID::Default:
case TargetStackID::NoAlloc:		case TargetStackID::NoAlloc:
case TargetStackID::SGPRSpill:		case TargetStackID::SGPRSpill:
return true;		return true;
▲ Show 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	if (!hasReservedCallFrame(MF)) {
llvm_unreachable("is this used?");		llvm_unreachable("is this used?");
}		}

return MBB.erase(I);		return MBB.erase(I);
}		}

bool SIFrameLowering::hasFP(const MachineFunction &MF) const {		bool SIFrameLowering::hasFP(const MachineFunction &MF) const {
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();

		if (MF.getInfo<SIMachineFunctionInfo>()->isEntryFunction()) {
		// In an entry function we can always use immediate offsets.
		// FIXME: Do we need/want to respect DisableFramePointerElim here? It isn't
		// possible to unwind out of the entry function anyway, so the option
		// doesn't seem useful in kernels.
		return false;
		}

if (MFI.hasCalls()) {		if (MFI.hasCalls()) {
// All offsets are unsigned, so need to be addressed in the same direction		// All offsets are unsigned, so need to be addressed in the same direction
// as stack growth.		// as stack growth.

// FIXME: This function is pretty broken, since it can be called before the		// FIXME: This function is pretty broken, since it can be called before the
// frame layout is determined or CSR spills are inserted.		// frame layout is determined or CSR spills are inserted.
if (MFI.getStackSize() != 0)		return MFI.getStackSize() != 0;
return true;

// For the entry point, the input wave scratch offset must be copied to the
// API SP if there are calls.
if (MF.getInfo<SIMachineFunctionInfo>()->isEntryFunction())
return true;
}		}

return MFI.hasVarSizedObjects() \|\| MFI.isFrameAddressTaken() \|\|		return MFI.hasVarSizedObjects() \|\| MFI.isFrameAddressTaken() \|\|
MFI.hasStackMap() \|\| MFI.hasPatchPoint() \|\|		MFI.hasStackMap() \|\| MFI.hasPatchPoint() \|\|
MF.getSubtarget<GCNSubtarget>().getRegisterInfo()->needsStackRealignment(MF) \|\|		MF.getSubtarget<GCNSubtarget>().getRegisterInfo()->needsStackRealignment(MF) \|\|
MF.getTarget().Options.DisableFramePointerElim(MF);		MF.getTarget().Options.DisableFramePointerElim(MF);
}		}

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,892 Lines • ▼ Show 20 Lines	if (RequiresStackAccess && ST.isAmdHsaOrMesa(MF.getFunction())) {
// argument to these reserved registers.		// argument to these reserved registers.

// Without HSA, relocations are used for the scratch pointer and the		// Without HSA, relocations are used for the scratch pointer and the
// buffer resource setup is always inserted in the prologue. Scratch wave		// buffer resource setup is always inserted in the prologue. Scratch wave
// offset is still in an input SGPR.		// offset is still in an input SGPR.
Info.setScratchRSrcReg(ReservedBufferReg);		Info.setScratchRSrcReg(ReservedBufferReg);
}		}

// hasFP should be accurate for kernels even before the frame is finalized.
if (ST.getFrameLowering()->hasFP(MF)) {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();

		if (MFI.hasCalls()) {
// Try to use s32 as the SP, but move it if it would interfere with input		// Try to use s32 as the SP, but move it if it would interfere with input
// arguments. This won't work with calls though.		// arguments. This won't work with calls though.
//		//
// FIXME: Move SP to avoid any possible inputs, or find a way to spill input		// FIXME: Move SP to avoid any possible inputs, or find a way to spill input
// registers.		// registers.
if (!MRI.isLiveIn(AMDGPU::SGPR32)) {		if (!MRI.isLiveIn(AMDGPU::SGPR32)) {
Info.setStackPtrOffsetReg(AMDGPU::SGPR32);		Info.setStackPtrOffsetReg(AMDGPU::SGPR32);
} else {		} else {
assert(AMDGPU::isShader(MF.getFunction().getCallingConv()));		assert(AMDGPU::isShader(MF.getFunction().getCallingConv()));

if (MFI.hasCalls())		if (MFI.hasCalls())
report_fatal_error("call in graphics shader with too many input SGPRs");		report_fatal_error("call in graphics shader with too many input SGPRs");

for (unsigned Reg : AMDGPU::SGPR_32RegClass) {		for (unsigned Reg : AMDGPU::SGPR_32RegClass) {
if (!MRI.isLiveIn(Reg)) {		if (!MRI.isLiveIn(Reg)) {
Info.setStackPtrOffsetReg(Reg);		Info.setStackPtrOffsetReg(Reg);
break;		break;
}		}
}		}

if (Info.getStackPtrOffsetReg() == AMDGPU::SP_REG)		if (Info.getStackPtrOffsetReg() == AMDGPU::SP_REG)
report_fatal_error("failed to find register for SP");		report_fatal_error("failed to find register for SP");
}		}

if (MFI.hasCalls()) {
Info.setScratchWaveOffsetReg(AMDGPU::SGPR33);
Info.setFrameOffsetReg(AMDGPU::SGPR33);
} else {
unsigned ReservedOffsetReg =
TRI.reservedPrivateSegmentWaveByteOffsetReg(MF);
Info.setScratchWaveOffsetReg(ReservedOffsetReg);
Info.setFrameOffsetReg(ReservedOffsetReg);
}
} else if (RequiresStackAccess) {
assert(!MFI.hasCalls());
// We know there are accesses and they will be done relative to SP, so just
// pin it to the input.
//
// FIXME: Should not do this if inline asm is reading/writing these
// registers.
Register PreloadedSP = Info.getPreloadedReg(
AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET);

Info.setStackPtrOffsetReg(PreloadedSP);
Info.setScratchWaveOffsetReg(PreloadedSP);
Info.setFrameOffsetReg(PreloadedSP);
} else {
assert(!MFI.hasCalls());

// There may not be stack access at all. There may still be spills, or
// access of a constant pointer (in which cases an extra copy will be
// emitted in the prolog).
unsigned ReservedOffsetReg
= TRI.reservedPrivateSegmentWaveByteOffsetReg(MF);
Info.setStackPtrOffsetReg(ReservedOffsetReg);
Info.setScratchWaveOffsetReg(ReservedOffsetReg);
Info.setFrameOffsetReg(ReservedOffsetReg);
}		}
}		}

bool SITargetLowering::supportSplitCSR(MachineFunction *MF) const {		bool SITargetLowering::supportSplitCSR(MachineFunction *MF) const {
const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();
return !Info->isEntryFunction();		return !Info->isEntryFunction();
}		}

▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	if (!IsEntryFunc) {
allocateSpecialInputVGPRs(CCInfo, MF, TRI, Info);		allocateSpecialInputVGPRs(CCInfo, MF, TRI, Info);
}		}

// Start adding system SGPRs.		// Start adding system SGPRs.
if (IsEntryFunc) {		if (IsEntryFunc) {
allocateSystemSGPRs(CCInfo, MF, *Info, CallConv, IsShader);		allocateSystemSGPRs(CCInfo, MF, *Info, CallConv, IsShader);
} else {		} else {
CCInfo.AllocateReg(Info->getScratchRSrcReg());		CCInfo.AllocateReg(Info->getScratchRSrcReg());
CCInfo.AllocateReg(Info->getScratchWaveOffsetReg());
CCInfo.AllocateReg(Info->getFrameOffsetReg());
allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);		allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);
}		}

auto &ArgUsageInfo =		auto &ArgUsageInfo =
DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();		DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();
ArgUsageInfo.setFuncArgInfo(Fn, Info->getArgInfo());		ArgUsageInfo.setFuncArgInfo(Fn, Info->getArgInfo());

unsigned StackArgSize = CCInfo.getNextStackOffset();		unsigned StackArgSize = CCInfo.getNextStackOffset();
▲ Show 20 Lines • Show All 8,381 Lines • ▼ Show 20 Lines	void SITargetLowering::finalizeLowering(MachineFunction &MF) const {
// We need to worry about replacing the default register with itself in case		// We need to worry about replacing the default register with itself in case
// of MIR testcases missing the MFI.		// of MIR testcases missing the MFI.
if (Info->getScratchRSrcReg() != AMDGPU::PRIVATE_RSRC_REG)		if (Info->getScratchRSrcReg() != AMDGPU::PRIVATE_RSRC_REG)
MRI.replaceRegWith(AMDGPU::PRIVATE_RSRC_REG, Info->getScratchRSrcReg());		MRI.replaceRegWith(AMDGPU::PRIVATE_RSRC_REG, Info->getScratchRSrcReg());

if (Info->getFrameOffsetReg() != AMDGPU::FP_REG)		if (Info->getFrameOffsetReg() != AMDGPU::FP_REG)
MRI.replaceRegWith(AMDGPU::FP_REG, Info->getFrameOffsetReg());		MRI.replaceRegWith(AMDGPU::FP_REG, Info->getFrameOffsetReg());

if (Info->getScratchWaveOffsetReg() != AMDGPU::SCRATCH_WAVE_OFFSET_REG) {
MRI.replaceRegWith(AMDGPU::SCRATCH_WAVE_OFFSET_REG,
Info->getScratchWaveOffsetReg());
}

Info->limitOccupancy(MF);		Info->limitOccupancy(MF);

if (ST.isWave32() && !MF.empty()) {		if (ST.isWave32() && !MF.empty()) {
// Add VCC_HI def because many instructions marked as imp-use VCC where		// Add VCC_HI def because many instructions marked as imp-use VCC where
// we may only define VCC_LO. If nothing defines VCC_HI we may end up		// we may only define VCC_LO. If nothing defines VCC_HI we may end up
// having a use of undef.		// having a use of undef.

const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 1,150 Lines • ▼ Show 20 Lines	if (RI.isSGPRClass(RC)) {

// The SGPR spill/restore instructions only work on number sgprs, so we need		// The SGPR spill/restore instructions only work on number sgprs, so we need
// to make sure we are using the correct register class.		// to make sure we are using the correct register class.
if (Register::isVirtualRegister(SrcReg) && SpillSize == 4) {		if (Register::isVirtualRegister(SrcReg) && SpillSize == 4) {
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
MRI.constrainRegClass(SrcReg, &AMDGPU::SReg_32_XM0RegClass);		MRI.constrainRegClass(SrcReg, &AMDGPU::SReg_32_XM0RegClass);
}		}

BuildMI(MBB, MI, DL, OpDesc)		auto MIB = BuildMI(MBB, MI, DL, OpDesc)
.addReg(SrcReg, getKillRegState(isKill)) // data		.addReg(SrcReg, getKillRegState(isKill)) // data
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - .addReg(SrcReg, getKillRegState(isKill)) // data - .addFrameIndex(FrameIndex) // addr - .addMemOperand(MMO) - .addReg(MFI->getScratchRSrcReg(), RegState::Implicit); + .addReg(SrcReg, getKillRegState(isKill)) // data + .addFrameIndex(FrameIndex) // addr + .addMemOperand(MMO) + .addReg(MFI->getScratchRSrcReg(), RegState::Implicit); Lint: Pre-merge checks: clang-format: please reformat the code ``` - .addReg(SrcReg, getKillRegState(isKill)) //…
.addFrameIndex(FrameIndex) // addr		.addFrameIndex(FrameIndex) // addr
.addMemOperand(MMO)		.addMemOperand(MMO)
.addReg(MFI->getScratchRSrcReg(), RegState::Implicit)		.addReg(MFI->getScratchRSrcReg(), RegState::Implicit);
.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);
// Add the scratch resource registers as implicit uses because we may end up		// Add the scratch resource registers as implicit uses because we may end up
// needing them, and need to ensure that the reserved registers are		// needing them, and need to ensure that the reserved registers are
// correctly handled.		// correctly handled.

		// Also add the stack pointer if we have one, for the same reason.
		if (!MFI->isEntryFunction())
		MIB.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);

if (RI.spillSGPRToVGPR())		if (RI.spillSGPRToVGPR())
FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);		FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);
return;		return;
}		}

unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillSaveOpcode(SpillSize)		unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillSaveOpcode(SpillSize)
: getVGPRSpillSaveOpcode(SpillSize);		: getVGPRSpillSaveOpcode(SpillSize);
MFI->setHasSpilledVGPRs();		MFI->setHasSpilledVGPRs();

auto MIB = BuildMI(MBB, MI, DL, get(Opcode));		auto MIB = BuildMI(MBB, MI, DL, get(Opcode));
if (RI.hasAGPRs(RC)) {		if (RI.hasAGPRs(RC)) {
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
Register Tmp = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		Register Tmp = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
MIB.addReg(Tmp, RegState::Define);		MIB.addReg(Tmp, RegState::Define);
}		}
MIB.addReg(SrcReg, getKillRegState(isKill)) // data		MIB.addReg(SrcReg, getKillRegState(isKill)) // data
.addFrameIndex(FrameIndex) // addr		.addFrameIndex(FrameIndex) // addr
.addReg(MFI->getScratchRSrcReg()) // scratch_rsrc		.addReg(MFI->getScratchRSrcReg()); // scratch_rsrc
.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset		if (MFI->isEntryFunction()) {
.addImm(0) // offset		MIB.addImm(0); // scratch_offset
		} else {
		MIB.addReg(MFI->getStackPtrOffsetReg()); // scratch_offset
		}
		MIB.addImm(0) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
}		}

static unsigned getSGPRSpillRestoreOpcode(unsigned Size) {		static unsigned getSGPRSpillRestoreOpcode(unsigned Size) {
switch (Size) {		switch (Size) {
case 4:		case 4:
return AMDGPU::SI_SPILL_S32_RESTORE;		return AMDGPU::SI_SPILL_S32_RESTORE;
case 8:		case 8:
return AMDGPU::SI_SPILL_S64_RESTORE;		return AMDGPU::SI_SPILL_S64_RESTORE;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	if (RI.isSGPRClass(RC)) {
const MCInstrDesc &OpDesc = get(getSGPRSpillRestoreOpcode(SpillSize));		const MCInstrDesc &OpDesc = get(getSGPRSpillRestoreOpcode(SpillSize));
if (Register::isVirtualRegister(DestReg) && SpillSize == 4) {		if (Register::isVirtualRegister(DestReg) && SpillSize == 4) {
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
MRI.constrainRegClass(DestReg, &AMDGPU::SReg_32_XM0RegClass);		MRI.constrainRegClass(DestReg, &AMDGPU::SReg_32_XM0RegClass);
}		}

if (RI.spillSGPRToVGPR())		if (RI.spillSGPRToVGPR())
FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);		FrameInfo.setStackID(FrameIndex, TargetStackID::SGPRSpill);
BuildMI(MBB, MI, DL, OpDesc, DestReg)		auto MIB = BuildMI(MBB, MI, DL, OpDesc, DestReg)
.addFrameIndex(FrameIndex) // addr		.addFrameIndex(FrameIndex) // addr
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - .addFrameIndex(FrameIndex) // addr - .addMemOperand(MMO) - .addReg(MFI->getScratchRSrcReg(), RegState::Implicit); + .addFrameIndex(FrameIndex) // addr + .addMemOperand(MMO) + .addReg(MFI->getScratchRSrcReg(), RegState::Implicit); Lint: Pre-merge checks: clang-format: please reformat the code ``` - .addFrameIndex(FrameIndex) // addr - .
.addMemOperand(MMO)		.addMemOperand(MMO)
.addReg(MFI->getScratchRSrcReg(), RegState::Implicit)		.addReg(MFI->getScratchRSrcReg(), RegState::Implicit);
.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);		if (!MFI->isEntryFunction())
		MIB.addReg(MFI->getStackPtrOffsetReg(), RegState::Implicit);
return;		return;
}		}

unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillRestoreOpcode(SpillSize)		unsigned Opcode = RI.hasAGPRs(RC) ? getAGPRSpillRestoreOpcode(SpillSize)
: getVGPRSpillRestoreOpcode(SpillSize);		: getVGPRSpillRestoreOpcode(SpillSize);
auto MIB = BuildMI(MBB, MI, DL, get(Opcode), DestReg);		auto MIB = BuildMI(MBB, MI, DL, get(Opcode), DestReg);
if (RI.hasAGPRs(RC)) {		if (RI.hasAGPRs(RC)) {
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
Register Tmp = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		Register Tmp = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
MIB.addReg(Tmp, RegState::Define);		MIB.addReg(Tmp, RegState::Define);
}		}
MIB.addFrameIndex(FrameIndex) // vaddr		MIB.addFrameIndex(FrameIndex) // vaddr
.addReg(MFI->getScratchRSrcReg()) // scratch_rsrc		.addReg(MFI->getScratchRSrcReg()); // scratch_rsrc
.addReg(MFI->getStackPtrOffsetReg()) // scratch_offset		if (MFI->isEntryFunction()) {
.addImm(0) // offset		MIB.addImm(0); // scratch_offset
		} else {
		MIB.addReg(MFI->getStackPtrOffsetReg()); // scratch_offset
		}
		MIB.addImm(0) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
}		}

/// \param @Offset Offset in bytes of the FrameIndex being spilled		/// \param @Offset Offset in bytes of the FrameIndex being spilled
unsigned SIInstrInfo::calculateLDSSpillAddress(		unsigned SIInstrInfo::calculateLDSSpillAddress(
MachineBasicBlock &MBB, MachineInstr &MI, RegScavenger *RS, unsigned TmpReg,		MachineBasicBlock &MBB, MachineInstr &MI, RegScavenger *RS, unsigned TmpReg,
unsigned FrameOffset, unsigned Size) const {		unsigned FrameOffset, unsigned Size) const {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
▲ Show 20 Lines • Show All 5,432 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

	Show First 20 Lines • Show All 630 Lines • ▼ Show 20 Lines
	defm SI_SPILL_S1024 : SI_SPILL_SGPR <SReg_1024>;			defm SI_SPILL_S1024 : SI_SPILL_SGPR <SReg_1024>;

	multiclass SI_SPILL_VGPR <RegisterClass vgpr_class> {			multiclass SI_SPILL_VGPR <RegisterClass vgpr_class> {
	let UseNamedOperandTable = 1, VGPRSpill = 1,			let UseNamedOperandTable = 1, VGPRSpill = 1,
	SchedRW = [WriteVMEM] in {			SchedRW = [WriteVMEM] in {
	def _SAVE : VPseudoInstSI <			def _SAVE : VPseudoInstSI <
	(outs),			(outs),
	(ins vgpr_class:$vdata, i32imm:$vaddr, SReg_128:$srsrc,			(ins vgpr_class:$vdata, i32imm:$vaddr, SReg_128:$srsrc,
	SReg_32:$soffset, i32imm:$offset)> {			type2:$soffset, i32imm:$offset)> {
	let mayStore = 1;			let mayStore = 1;
	let mayLoad = 0;			let mayLoad = 0;
	// (2 * 4) + (8 * num_subregs) bytes maximum			// (2 * 4) + (8 * num_subregs) bytes maximum
	int MaxSize = !add(!shl(!srl(vgpr_class.Size, 5), 3), 8);			int MaxSize = !add(!shl(!srl(vgpr_class.Size, 5), 3), 8);
	// Size field is unsigned char and cannot fit more.			// Size field is unsigned char and cannot fit more.
	let Size = !if(!le(MaxSize, 256), MaxSize, 252);			let Size = !if(!le(MaxSize, 256), MaxSize, 252);
	}			}

	def _RESTORE : VPseudoInstSI <			def _RESTORE : VPseudoInstSI <
	(outs vgpr_class:$vdata),			(outs vgpr_class:$vdata),
	(ins i32imm:$vaddr, SReg_128:$srsrc, SReg_32:$soffset,			(ins i32imm:$vaddr, SReg_128:$srsrc, type2:$soffset,
	i32imm:$offset)> {			i32imm:$offset)> {
	let mayStore = 0;			let mayStore = 0;
	let mayLoad = 1;			let mayLoad = 1;

	// (2 * 4) + (8 * num_subregs) bytes maximum			// (2 * 4) + (8 * num_subregs) bytes maximum
	int MaxSize = !add(!shl(!srl(vgpr_class.Size, 5), 3), 8);			int MaxSize = !add(!shl(!srl(vgpr_class.Size, 5), 3), 8);
	// Size field is unsigned char and cannot fit more.			// Size field is unsigned char and cannot fit more.
	let Size = !if(!le(MaxSize, 256), MaxSize, 252);			let Size = !if(!le(MaxSize, 256), MaxSize, 252);
	Show All 12 Lines

	multiclass SI_SPILL_AGPR <RegisterClass vgpr_class> {			multiclass SI_SPILL_AGPR <RegisterClass vgpr_class> {
	let UseNamedOperandTable = 1, VGPRSpill = 1,			let UseNamedOperandTable = 1, VGPRSpill = 1,
	Constraints = "@earlyclobber $tmp",			Constraints = "@earlyclobber $tmp",
	SchedRW = [WriteVMEM] in {			SchedRW = [WriteVMEM] in {
	def _SAVE : VPseudoInstSI <			def _SAVE : VPseudoInstSI <
	(outs VGPR_32:$tmp),			(outs VGPR_32:$tmp),
	(ins vgpr_class:$vdata, i32imm:$vaddr, SReg_128:$srsrc,			(ins vgpr_class:$vdata, i32imm:$vaddr, SReg_128:$srsrc,
	SReg_32:$soffset, i32imm:$offset)> {			type2:$soffset, i32imm:$offset)> {
	let mayStore = 1;			let mayStore = 1;
	let mayLoad = 0;			let mayLoad = 0;
	// (2 * 4) + (16 * num_subregs) bytes maximum			// (2 * 4) + (16 * num_subregs) bytes maximum
	int MaxSize = !add(!shl(!srl(vgpr_class.Size, 5), 4), 8);			int MaxSize = !add(!shl(!srl(vgpr_class.Size, 5), 4), 8);
	// Size field is unsigned char and cannot fit more.			// Size field is unsigned char and cannot fit more.
	let Size = !if(!le(MaxSize, 256), MaxSize, 252);			let Size = !if(!le(MaxSize, 256), MaxSize, 252);
	}			}

	def _RESTORE : VPseudoInstSI <			def _RESTORE : VPseudoInstSI <
	(outs vgpr_class:$vdata, VGPR_32:$tmp),			(outs vgpr_class:$vdata, VGPR_32:$tmp),
	(ins i32imm:$vaddr, SReg_128:$srsrc, SReg_32:$soffset,			(ins i32imm:$vaddr, SReg_128:$srsrc, type2:$soffset,
	i32imm:$offset)> {			i32imm:$offset)> {
	let mayStore = 0;			let mayStore = 0;
	let mayLoad = 1;			let mayLoad = 1;

	// (2 * 4) + (16 * num_subregs) bytes maximum			// (2 * 4) + (16 * num_subregs) bytes maximum
	int MaxSize = !add(!shl(!srl(vgpr_class.Size, 5), 4), 8);			int MaxSize = !add(!shl(!srl(vgpr_class.Size, 5), 4), 8);
	// Size field is unsigned char and cannot fit more.			// Size field is unsigned char and cannot fit more.
	let Size = !if(!le(MaxSize, 256), MaxSize, 252);			let Size = !if(!le(MaxSize, 256), MaxSize, 252);
	▲ Show 20 Lines • Show All 1,648 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	struct SIMachineFunctionInfo final : public yaml::MachineFunctionInfo {
unsigned LDSSize = 0;		unsigned LDSSize = 0;
bool IsEntryFunction = false;		bool IsEntryFunction = false;
bool NoSignedZerosFPMath = false;		bool NoSignedZerosFPMath = false;
bool MemoryBound = false;		bool MemoryBound = false;
bool WaveLimiter = false;		bool WaveLimiter = false;
uint32_t HighBitsOf32BitAddress = 0;		uint32_t HighBitsOf32BitAddress = 0;

StringValue ScratchRSrcReg = "$private_rsrc_reg";		StringValue ScratchRSrcReg = "$private_rsrc_reg";
StringValue ScratchWaveOffsetReg = "$scratch_wave_offset_reg";
StringValue FrameOffsetReg = "$fp_reg";		StringValue FrameOffsetReg = "$fp_reg";
StringValue StackPtrOffsetReg = "$sp_reg";		StringValue StackPtrOffsetReg = "$sp_reg";

Optional<SIArgumentInfo> ArgInfo;		Optional<SIArgumentInfo> ArgInfo;
SIMode Mode;		SIMode Mode;

SIMachineFunctionInfo() = default;		SIMachineFunctionInfo() = default;
SIMachineFunctionInfo(const llvm::SIMachineFunctionInfo &,		SIMachineFunctionInfo(const llvm::SIMachineFunctionInfo &,
Show All 10 Lines	static void mapping(IO &YamlIO, SIMachineFunctionInfo &MFI) {
YamlIO.mapOptional("maxKernArgAlign", MFI.MaxKernArgAlign, 0u);		YamlIO.mapOptional("maxKernArgAlign", MFI.MaxKernArgAlign, 0u);
YamlIO.mapOptional("ldsSize", MFI.LDSSize, 0u);		YamlIO.mapOptional("ldsSize", MFI.LDSSize, 0u);
YamlIO.mapOptional("isEntryFunction", MFI.IsEntryFunction, false);		YamlIO.mapOptional("isEntryFunction", MFI.IsEntryFunction, false);
YamlIO.mapOptional("noSignedZerosFPMath", MFI.NoSignedZerosFPMath, false);		YamlIO.mapOptional("noSignedZerosFPMath", MFI.NoSignedZerosFPMath, false);
YamlIO.mapOptional("memoryBound", MFI.MemoryBound, false);		YamlIO.mapOptional("memoryBound", MFI.MemoryBound, false);
YamlIO.mapOptional("waveLimiter", MFI.WaveLimiter, false);		YamlIO.mapOptional("waveLimiter", MFI.WaveLimiter, false);
YamlIO.mapOptional("scratchRSrcReg", MFI.ScratchRSrcReg,		YamlIO.mapOptional("scratchRSrcReg", MFI.ScratchRSrcReg,
StringValue("$private_rsrc_reg"));		StringValue("$private_rsrc_reg"));
YamlIO.mapOptional("scratchWaveOffsetReg", MFI.ScratchWaveOffsetReg,
StringValue("$scratch_wave_offset_reg"));
YamlIO.mapOptional("frameOffsetReg", MFI.FrameOffsetReg,		YamlIO.mapOptional("frameOffsetReg", MFI.FrameOffsetReg,
StringValue("$fp_reg"));		StringValue("$fp_reg"));
YamlIO.mapOptional("stackPtrOffsetReg", MFI.StackPtrOffsetReg,		YamlIO.mapOptional("stackPtrOffsetReg", MFI.StackPtrOffsetReg,
StringValue("$sp_reg"));		StringValue("$sp_reg"));
YamlIO.mapOptional("argumentInfo", MFI.ArgInfo);		YamlIO.mapOptional("argumentInfo", MFI.ArgInfo);
YamlIO.mapOptional("mode", MFI.Mode, SIMode());		YamlIO.mapOptional("mode", MFI.Mode, SIMode());
YamlIO.mapOptional("highBitsOf32BitAddress",		YamlIO.mapOptional("highBitsOf32BitAddress",
MFI.HighBitsOf32BitAddress, 0u);		MFI.HighBitsOf32BitAddress, 0u);
}		}
};		};

} // end namespace yaml		} // end namespace yaml

/// This class keeps track of the SPI_SP_INPUT_ADDR config register, which		/// This class keeps track of the SPI_SP_INPUT_ADDR config register, which
/// tells the hardware which interpolation parameters to load.		/// tells the hardware which interpolation parameters to load.
class SIMachineFunctionInfo final : public AMDGPUMachineFunction {		class SIMachineFunctionInfo final : public AMDGPUMachineFunction {
friend class GCNTargetMachine;		friend class GCNTargetMachine;

unsigned TIDReg = AMDGPU::NoRegister;		unsigned TIDReg = AMDGPU::NoRegister;

// Registers that may be reserved for spilling purposes. These may be the same		// Registers that may be reserved for spilling purposes. These may be the same
// as the input registers.		// as the input registers.
unsigned ScratchRSrcReg = AMDGPU::PRIVATE_RSRC_REG;		unsigned ScratchRSrcReg = AMDGPU::PRIVATE_RSRC_REG;
unsigned ScratchWaveOffsetReg = AMDGPU::SCRATCH_WAVE_OFFSET_REG;

// This is the current function's incremented size from the kernel's scratch		// This is the the unswizzled offset from the current dispatch's scratch wave
// wave offset register. For an entry function, this is exactly the same as		// base to the beginning of the current function's frame. For an entry
// the ScratchWaveOffsetReg.		// function, this is 0.
unsigned FrameOffsetReg = AMDGPU::FP_REG;		unsigned FrameOffsetReg = AMDGPU::FP_REG;
		arsenmUnsubmitted Not Done Reply Inline Actions These should be switched to Register at some point arsenm: These should be switched to Register at some point
		scott.linderAuthorUnsubmitted Done Reply Inline Actions I haven't gotten around to this yet, but I'll do this in another NFC patch. scott.linder: I haven't gotten around to this yet, but I'll do this in another NFC patch.

// Top of the stack SGPR offset derived from the ScratchWaveOffsetReg.		// This is an ABI register used in the non-entry calling convention to
		// communicate the unswizzled offset from the current dispatch's scratch wave
		// base to the beginning of the new function's frame.
unsigned StackPtrOffsetReg = AMDGPU::SP_REG;		unsigned StackPtrOffsetReg = AMDGPU::SP_REG;

AMDGPUFunctionArgInfo ArgInfo;		AMDGPUFunctionArgInfo ArgInfo;

// Graphics info.		// Graphics info.
unsigned PSInputAddr = 0;		unsigned PSInputAddr = 0;
unsigned PSInputEnable = 0;		unsigned PSInputEnable = 0;

▲ Show 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	unsigned getScratchRSrcReg() const {
return ScratchRSrcReg;		return ScratchRSrcReg;
}		}

void setScratchRSrcReg(unsigned Reg) {		void setScratchRSrcReg(unsigned Reg) {
assert(Reg != 0 && "Should never be unset");		assert(Reg != 0 && "Should never be unset");
ScratchRSrcReg = Reg;		ScratchRSrcReg = Reg;
}		}

unsigned getScratchWaveOffsetReg() const {
return ScratchWaveOffsetReg;
}

unsigned getFrameOffsetReg() const {		unsigned getFrameOffsetReg() const {
return FrameOffsetReg;		return FrameOffsetReg;
}		}

void setFrameOffsetReg(unsigned Reg) {		void setFrameOffsetReg(unsigned Reg) {
assert(Reg != 0 && "Should never be unset");		assert(Reg != 0 && "Should never be unset");
FrameOffsetReg = Reg;		FrameOffsetReg = Reg;
}		}

void setStackPtrOffsetReg(unsigned Reg) {		void setStackPtrOffsetReg(unsigned Reg) {
assert(Reg != 0 && "Should never be unset");		assert(Reg != 0 && "Should never be unset");
StackPtrOffsetReg = Reg;		StackPtrOffsetReg = Reg;
}		}

// Note the unset value for this is AMDGPU::SP_REG rather than		// Note the unset value for this is AMDGPU::SP_REG rather than
// NoRegister. This is mostly a workaround for MIR tests where state that		// NoRegister. This is mostly a workaround for MIR tests where state that
// can't be directly computed from the function is not preserved in serialized		// can't be directly computed from the function is not preserved in serialized
// MIR.		// MIR.
unsigned getStackPtrOffsetReg() const {		unsigned getStackPtrOffsetReg() const {
return StackPtrOffsetReg;		return StackPtrOffsetReg;
}		}

void setScratchWaveOffsetReg(unsigned Reg) {
assert(Reg != 0 && "Should never be unset");
ScratchWaveOffsetReg = Reg;
}

unsigned getQueuePtrUserSGPR() const {		unsigned getQueuePtrUserSGPR() const {
return ArgInfo.QueuePtr.getRegister();		return ArgInfo.QueuePtr.getRegister();
}		}

unsigned getImplicitBufferPtrUserSGPR() const {		unsigned getImplicitBufferPtrUserSGPR() const {
return ArgInfo.ImplicitBufferPtr.getRegister();		return ArgInfo.ImplicitBufferPtr.getRegister();
}		}

▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	SIMachineFunctionInfo::SIMachineFunctionInfo(const MachineFunction &MF)
} else if (CC == CallingConv::AMDGPU_PS) {		} else if (CC == CallingConv::AMDGPU_PS) {
PSInputAddr = AMDGPU::getInitialPSInputAddr(F);		PSInputAddr = AMDGPU::getInitialPSInputAddr(F);
}		}

if (!isEntryFunction()) {		if (!isEntryFunction()) {
// Non-entry functions have no special inputs for now, other registers		// Non-entry functions have no special inputs for now, other registers
// required for scratch access.		// required for scratch access.
ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;		ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;
ScratchWaveOffsetReg = AMDGPU::SGPR33;

// TODO: Pick a high register, and shift down, similar to a kernel.		// TODO: Pick a high register, and shift down, similar to a kernel.
FrameOffsetReg = AMDGPU::SGPR34;		FrameOffsetReg = AMDGPU::SGPR34;
StackPtrOffsetReg = AMDGPU::SGPR32;		StackPtrOffsetReg = AMDGPU::SGPR32;

ArgInfo.PrivateSegmentBuffer =		ArgInfo.PrivateSegmentBuffer =
ArgDescriptor::createRegister(ScratchRSrcReg);		ArgDescriptor::createRegister(ScratchRSrcReg);
ArgInfo.PrivateSegmentWaveByteOffset =
ArgDescriptor::createRegister(ScratchWaveOffsetReg);

if (F.hasFnAttribute("amdgpu-implicitarg-ptr"))		if (F.hasFnAttribute("amdgpu-implicitarg-ptr"))
ImplicitArgPtr = true;		ImplicitArgPtr = true;
} else {		} else {
if (F.hasFnAttribute("amdgpu-implicitarg-ptr")) {		if (F.hasFnAttribute("amdgpu-implicitarg-ptr")) {
KernargSegmentPtr = true;		KernargSegmentPtr = true;
MaxKernArgAlign = std::max(ST.getAlignmentForImplicitArgPtr(),		MaxKernArgAlign = std::max(ST.getAlignmentForImplicitArgPtr(),
MaxKernArgAlign);		MaxKernArgAlign);
▲ Show 20 Lines • Show All 393 Lines • ▼ Show 20 Lines	: ExplicitKernArgSize(MFI.getExplicitKernArgSize()),
MaxKernArgAlign(MFI.getMaxKernArgAlign()),		MaxKernArgAlign(MFI.getMaxKernArgAlign()),
LDSSize(MFI.getLDSSize()),		LDSSize(MFI.getLDSSize()),
IsEntryFunction(MFI.isEntryFunction()),		IsEntryFunction(MFI.isEntryFunction()),
NoSignedZerosFPMath(MFI.hasNoSignedZerosFPMath()),		NoSignedZerosFPMath(MFI.hasNoSignedZerosFPMath()),
MemoryBound(MFI.isMemoryBound()),		MemoryBound(MFI.isMemoryBound()),
WaveLimiter(MFI.needsWaveLimiter()),		WaveLimiter(MFI.needsWaveLimiter()),
HighBitsOf32BitAddress(MFI.get32BitAddressHighBits()),		HighBitsOf32BitAddress(MFI.get32BitAddressHighBits()),
ScratchRSrcReg(regToString(MFI.getScratchRSrcReg(), TRI)),		ScratchRSrcReg(regToString(MFI.getScratchRSrcReg(), TRI)),
ScratchWaveOffsetReg(regToString(MFI.getScratchWaveOffsetReg(), TRI)),
FrameOffsetReg(regToString(MFI.getFrameOffsetReg(), TRI)),		FrameOffsetReg(regToString(MFI.getFrameOffsetReg(), TRI)),
StackPtrOffsetReg(regToString(MFI.getStackPtrOffsetReg(), TRI)),		StackPtrOffsetReg(regToString(MFI.getStackPtrOffsetReg(), TRI)),
ArgInfo(convertArgumentInfo(MFI.getArgInfo(), TRI)),		ArgInfo(convertArgumentInfo(MFI.getArgInfo(), TRI)),
Mode(MFI.getMode()) {}		Mode(MFI.getMode()) {}

void yaml::SIMachineFunctionInfo::mappingImpl(yaml::IO &YamlIO) {		void yaml::SIMachineFunctionInfo::mappingImpl(yaml::IO &YamlIO) {
MappingTraits<SIMachineFunctionInfo>::mapping(YamlIO, *this);		MappingTraits<SIMachineFunctionInfo>::mapping(YamlIO, *this);
}		}
Show All 13 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	public:
bool spillSGPRToVGPR() const {		bool spillSGPRToVGPR() const {
return SpillSGPRToVGPR;		return SpillSGPRToVGPR;
}		}

/// Return the end register initially reserved for the scratch buffer in case		/// Return the end register initially reserved for the scratch buffer in case
/// spilling is needed.		/// spilling is needed.
unsigned reservedPrivateSegmentBufferReg(const MachineFunction &MF) const;		unsigned reservedPrivateSegmentBufferReg(const MachineFunction &MF) const;

/// Return the end register initially reserved for the scratch wave offset in
/// case spilling is needed.
unsigned reservedPrivateSegmentWaveByteOffsetReg(
const MachineFunction &MF) const;

BitVector getReservedRegs(const MachineFunction &MF) const override;		BitVector getReservedRegs(const MachineFunction &MF) const override;

const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;		const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;
const MCPhysReg getCalleeSavedRegsViaCopy(const MachineFunction MF) const;		const MCPhysReg getCalleeSavedRegsViaCopy(const MachineFunction MF) const;
const uint32_t *getCallPreservedMask(const MachineFunction &MF,		const uint32_t *getCallPreservedMask(const MachineFunction &MF,
CallingConv::ID) const override;		CallingConv::ID) const override;

// Stack access is very expensive. CSRs are also the high registers, and we		// Stack access is very expensive. CSRs are also the high registers, and we
▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	default:
return nullptr;		return nullptr;
}		}
}		}

Register SIRegisterInfo::getFrameRegister(const MachineFunction &MF) const {		Register SIRegisterInfo::getFrameRegister(const MachineFunction &MF) const {
const SIFrameLowering *TFI =		const SIFrameLowering *TFI =
MF.getSubtarget<GCNSubtarget>().getFrameLowering();		MF.getSubtarget<GCNSubtarget>().getFrameLowering();
const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
		if (FuncInfo->isEntryFunction())
		return AMDGPU::NoRegister;
return TFI->hasFP(MF) ? FuncInfo->getFrameOffsetReg()		return TFI->hasFP(MF) ? FuncInfo->getFrameOffsetReg()
: FuncInfo->getStackPtrOffsetReg();		: FuncInfo->getStackPtrOffsetReg();
}		}

		arsenmUnsubmitted Done Reply Inline Actions s/NoRegister/Register() arsenm: s/NoRegister/Register()
const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const {		const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const {
return CSR_AMDGPU_AllVGPRs_RegMask;		return CSR_AMDGPU_AllVGPRs_RegMask;
}		}

const uint32_t *SIRegisterInfo::getAllAllocatableSRegMask() const {		const uint32_t *SIRegisterInfo::getAllAllocatableSRegMask() const {
return CSR_AMDGPU_AllAllocatableSRegs_RegMask;		return CSR_AMDGPU_AllAllocatableSRegs_RegMask;
}		}

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines

unsigned SIRegisterInfo::reservedPrivateSegmentBufferReg(		unsigned SIRegisterInfo::reservedPrivateSegmentBufferReg(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
unsigned BaseIdx = alignDown(ST.getMaxNumSGPRs(MF), 4) - 4;		unsigned BaseIdx = alignDown(ST.getMaxNumSGPRs(MF), 4) - 4;
unsigned BaseReg(AMDGPU::SGPR_32RegClass.getRegister(BaseIdx));		unsigned BaseReg(AMDGPU::SGPR_32RegClass.getRegister(BaseIdx));
return getMatchingSuperReg(BaseReg, AMDGPU::sub0, &AMDGPU::SGPR_128RegClass);		return getMatchingSuperReg(BaseReg, AMDGPU::sub0, &AMDGPU::SGPR_128RegClass);
}		}

static unsigned findPrivateSegmentWaveByteOffsetRegIndex(unsigned RegCount) {
unsigned Reg;

// Try to place it in a hole after PrivateSegmentBufferReg.
if (RegCount & 3) {
// We cannot put the segment buffer in (Idx - 4) ... (Idx - 1) due to
// alignment constraints, so we have a hole where can put the wave offset.
Reg = RegCount - 1;
} else {
// We can put the segment buffer in (Idx - 4) ... (Idx - 1) and put the
// wave offset before it.
Reg = RegCount - 5;
}

return Reg;
}

unsigned SIRegisterInfo::reservedPrivateSegmentWaveByteOffsetReg(
const MachineFunction &MF) const {
unsigned Reg = findPrivateSegmentWaveByteOffsetRegIndex(ST.getMaxNumSGPRs(MF));
return AMDGPU::SGPR_32RegClass.getRegister(Reg);
}

BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {		BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
BitVector Reserved(getNumRegs());		BitVector Reserved(getNumRegs());

// EXEC_LO and EXEC_HI could be allocated and used as regular register, but		// EXEC_LO and EXEC_HI could be allocated and used as regular register, but
// this seems likely to result in bugs, so I'm marking them as reserved.		// this seems likely to result in bugs, so I'm marking them as reserved.
reserveRegisterTuples(Reserved, AMDGPU::EXEC);		reserveRegisterTuples(Reserved, AMDGPU::EXEC);
reserveRegisterTuples(Reserved, AMDGPU::FLAT_SCR);		reserveRegisterTuples(Reserved, AMDGPU::FLAT_SCR);

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	if (!ST.hasMAIInsts()) {
for (unsigned i = 0; i < MaxNumVGPRs; ++i) {		for (unsigned i = 0; i < MaxNumVGPRs; ++i) {
unsigned Reg = AMDGPU::AGPR_32RegClass.getRegister(i);		unsigned Reg = AMDGPU::AGPR_32RegClass.getRegister(i);
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);
}		}
}		}

const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

unsigned ScratchWaveOffsetReg = MFI->getScratchWaveOffsetReg();
if (ScratchWaveOffsetReg != AMDGPU::NoRegister) {
// Reserve 1 SGPR for scratch wave offset in case we need to spill.
reserveRegisterTuples(Reserved, ScratchWaveOffsetReg);
}

unsigned ScratchRSrcReg = MFI->getScratchRSrcReg();		unsigned ScratchRSrcReg = MFI->getScratchRSrcReg();
if (ScratchRSrcReg != AMDGPU::NoRegister) {		if (ScratchRSrcReg != AMDGPU::NoRegister) {
// Reserve 4 SGPRs for the scratch buffer resource descriptor in case we need		// Reserve 4 SGPRs for the scratch buffer resource descriptor in case we need
// to spill.		// to spill.
// TODO: May need to reserve a VGPR if doing LDS spilling.		// TODO: May need to reserve a VGPR if doing LDS spilling.
reserveRegisterTuples(Reserved, ScratchRSrcReg);		reserveRegisterTuples(Reserved, ScratchRSrcReg);
assert(!isSubRegister(ScratchRSrcReg, ScratchWaveOffsetReg));
}		}

// We have to assume the SP is needed in case there are calls in the function,		// We have to assume the SP is needed in case there are calls in the function,
// which is detected after the function is lowered. If we aren't really going		// which is detected after the function is lowered. If we aren't really going
// to need SP, don't bother reserving it.		// to need SP, don't bother reserving it.
unsigned StackPtrReg = MFI->getStackPtrOffsetReg();		unsigned StackPtrReg = MFI->getStackPtrOffsetReg();

if (StackPtrReg != AMDGPU::NoRegister) {		if (StackPtrReg != AMDGPU::NoRegister) {
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	for (const MachineOperand &MO: MI.operands()) {
}		}
}		}
#endif		#endif

MachineOperand *FIOp = TII->getNamedOperand(MI, AMDGPU::OpName::vaddr);		MachineOperand *FIOp = TII->getNamedOperand(MI, AMDGPU::OpName::vaddr);
#ifndef NDEBUG		#ifndef NDEBUG
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
MachineFunction *MF = MBB->getParent();		MachineFunction *MF = MBB->getParent();
		auto &SOffset = *TII->getNamedOperand(MI, AMDGPU::OpName::soffset);
#endif		#endif
assert(FIOp && FIOp->isFI() && "frame index must be address operand");		assert(FIOp && FIOp->isFI() && "frame index must be address operand");
assert(TII->isMUBUF(MI));		assert(TII->isMUBUF(MI));
assert(TII->getNamedOperand(MI, AMDGPU::OpName::soffset)->getReg() ==		assert((SOffset.isReg() &&
MF->getInfo<SIMachineFunctionInfo>()->getStackPtrOffsetReg() &&		(SOffset.getReg() ==
"should only be seeing stack pointer offset relative FrameIndex");		MF->getInfo<SIMachineFunctionInfo>()->getStackPtrOffsetReg())) \|\|
		(SOffset.isImm() && SOffset.getImm() == 0) &&
		"should only be seeing stack pointer or 0 offset relative "
		"FrameIndex");

MachineOperand *OffsetOp = TII->getNamedOperand(MI, AMDGPU::OpName::offset);		MachineOperand *OffsetOp = TII->getNamedOperand(MI, AMDGPU::OpName::offset);
int64_t NewOffset = OffsetOp->getImm() + Offset;		int64_t NewOffset = OffsetOp->getImm() + Offset;
assert(isUInt<12>(NewOffset) && "offset should be legal");		assert(isUInt<12>(NewOffset) && "offset should be legal");

FIOp->ChangeToRegister(BaseReg, false);		FIOp->ChangeToRegister(BaseReg, false);
OffsetOp->setImm(NewOffset);		OffsetOp->setImm(NewOffset);
}		}
▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	if (!isUInt<12>(Offset + Size - EltSize)) {
Offset *= ST.getWavefrontSize();		Offset *= ST.getWavefrontSize();

// We don't have access to the register scavenger if this function is called		// We don't have access to the register scavenger if this function is called
// during PEI::scavengeFrameVirtualRegs().		// during PEI::scavengeFrameVirtualRegs().
if (RS)		if (RS)
SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);		SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);

if (SOffset == AMDGPU::NoRegister) {		if (SOffset == AMDGPU::NoRegister) {
		if (ScratchOffsetReg == AMDGPU::NoRegister) {
		report_fatal_error("could not scavenge SGPR to spill in entry function");
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - report_fatal_error("could not scavenge SGPR to spill in entry function"); + report_fatal_error( + "could not scavenge SGPR to spill in entry function"); Lint: Pre-merge checks: clang-format: please reformat the code ``` - report_fatal_error("could not scavenge SGPR…
		}
// There are no free SGPRs, and since we are in the process of spilling		// There are no free SGPRs, and since we are in the process of spilling
// VGPRs too. Since we need a VGPR in order to spill SGPRs (this is true		// VGPRs too. Since we need a VGPR in order to spill SGPRs (this is true
// on SI/CI and on VI it is true until we implement spilling using scalar		// on SI/CI and on VI it is true until we implement spilling using scalar
// stores), we have no way to free up an SGPR. Our solution here is to		// stores), we have no way to free up an SGPR. Our solution here is to
// add the offset directly to the ScratchOffset register, and then		// add the offset directly to the ScratchOffset register, and then
// subtract the offset after the spill to return ScratchOffset to it's		// subtract the offset after the spill to return ScratchOffset to it's
// original value.		// original value.
SOffset = ScratchOffsetReg;		SOffset = ScratchOffsetReg;
ScratchOffsetRegDelta = Offset;		ScratchOffsetRegDelta = Offset;
} else {		} else {
Scavenged = true;		Scavenged = true;
}		}

		if (ScratchOffsetReg == AMDGPU::NoRegister) {
		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), SOffset)
		.addImm(Offset);
		} else {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), SOffset)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), SOffset)
.addReg(ScratchOffsetReg)		.addReg(ScratchOffsetReg)
.addImm(Offset);		.addImm(Offset);
		}

Offset = 0;		Offset = 0;
}		}

for (unsigned i = 0, e = NumSubRegs; i != e; ++i, Offset += EltSize) {		for (unsigned i = 0, e = NumSubRegs; i != e; ++i, Offset += EltSize) {
Register SubReg = NumSubRegs == 1		Register SubReg = NumSubRegs == 1
? Register(ValueReg)		? Register(ValueReg)
: getSubReg(ValueReg, getSubRegFromChannel(i));		: getSubReg(ValueReg, getSubRegFromChannel(i));
Show All 18 Lines	if (!MIB.getInstr()) {
}		}

MachinePointerInfo PInfo = BasePtrInfo.getWithOffset(EltSize * i);		MachinePointerInfo PInfo = BasePtrInfo.getWithOffset(EltSize * i);
MachineMemOperand *NewMMO		MachineMemOperand *NewMMO
= MF->getMachineMemOperand(PInfo, MMO->getFlags(),		= MF->getMachineMemOperand(PInfo, MMO->getFlags(),
EltSize, MinAlign(Align, EltSize * i));		EltSize, MinAlign(Align, EltSize * i));

MIB = BuildMI(*MBB, MI, DL, Desc)		MIB = BuildMI(*MBB, MI, DL, Desc)
.addReg(SubReg, getDefRegState(!IsStore) \| getKillRegState(IsKill))		.addReg(SubReg,
.addReg(ScratchRsrcReg)		getDefRegState(!IsStore) \| getKillRegState(IsKill))
.addReg(SOffset, SOffsetRegState)		.addReg(ScratchRsrcReg);
.addImm(Offset)		if (SOffset == AMDGPU::NoRegister) {
		MIB.addImm(0);
		} else {
		MIB.addReg(SOffset, SOffsetRegState);
		}
		MIB.addImm(Offset)
.addImm(0) // glc		.addImm(0) // glc
.addImm(0) // slc		.addImm(0) // slc
.addImm(0) // tfe		.addImm(0) // tfe
.addImm(0) // dlc		.addImm(0) // dlc
.addImm(0) // swz		.addImm(0) // swz
.addMemOperand(NewMMO);		.addMemOperand(NewMMO);

if (!IsStore && TmpReg != AMDGPU::NoRegister)		if (!IsStore && TmpReg != AMDGPU::NoRegister)
MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32),		MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32),
FinalReg)		FinalReg)
.addReg(TmpReg, RegState::Kill);		.addReg(TmpReg, RegState::Kill);
}		}

if (NumSubRegs > 1)		if (NumSubRegs > 1)
Show All 27 Lines	bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI,

Register SuperReg = MI->getOperand(0).getReg();		Register SuperReg = MI->getOperand(0).getReg();
bool IsKill = MI->getOperand(0).isKill();		bool IsKill = MI->getOperand(0).isKill();
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();

MachineFrameInfo &FrameInfo = MF->getFrameInfo();		MachineFrameInfo &FrameInfo = MF->getFrameInfo();

assert(SpillToVGPR \|\| (SuperReg != MFI->getStackPtrOffsetReg() &&		assert(SpillToVGPR \|\| (SuperReg != MFI->getStackPtrOffsetReg() &&
SuperReg != MFI->getFrameOffsetReg() &&		SuperReg != MFI->getFrameOffsetReg()));
SuperReg != MFI->getScratchWaveOffsetReg()));

assert(SuperReg != AMDGPU::M0 && "m0 should never spill");		assert(SuperReg != AMDGPU::M0 && "m0 should never spill");

unsigned EltSize = 4;		unsigned EltSize = 4;
const TargetRegisterClass *RC = getPhysRegClass(SuperReg);		const TargetRegisterClass *RC = getPhysRegClass(SuperReg);

ArrayRef<int16_t> SplitParts = getRegSplitParts(RC, EltSize);		ArrayRef<int16_t> SplitParts = getRegSplitParts(RC, EltSize);
unsigned NumSubRegs = SplitParts.empty() ? 1 : SplitParts.size();		unsigned NumSubRegs = SplitParts.empty() ? 1 : SplitParts.size();
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (SpillToVGPR) {
}		}

unsigned Align = FrameInfo.getObjectAlignment(Index);		unsigned Align = FrameInfo.getObjectAlignment(Index);
MachinePointerInfo PtrInfo		MachinePointerInfo PtrInfo
= MachinePointerInfo::getFixedStack(MF, Index, EltSize i);		= MachinePointerInfo::getFixedStack(MF, Index, EltSize i);
MachineMemOperand *MMO		MachineMemOperand *MMO
= MF->getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore,		= MF->getMachineMemOperand(PtrInfo, MachineMemOperand::MOStore,
EltSize, MinAlign(Align, EltSize * i));		EltSize, MinAlign(Align, EltSize * i));
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::SI_SPILL_V32_SAVE))		auto MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::SI_SPILL_V32_SAVE))
.addReg(TmpVGPR, RegState::Kill) // src		.addReg(TmpVGPR, RegState::Kill) // src
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - .addReg(TmpVGPR, RegState::Kill) // src - .addFrameIndex(Index) // vaddr - .addReg(MFI->getScratchRSrcReg()); // srrsrc + .addReg(TmpVGPR, RegState::Kill) // src + .addFrameIndex(Index) // vaddr + .addReg(MFI->getScratchRSrcReg()); // srrsrc Lint: Pre-merge checks: clang-format: please reformat the code ``` - .addReg(TmpVGPR, RegState::Kill) // src…
.addFrameIndex(Index) // vaddr		.addFrameIndex(Index) // vaddr
.addReg(MFI->getScratchRSrcReg()) // srrsrc		.addReg(MFI->getScratchRSrcReg()); // srrsrc
.addReg(MFI->getStackPtrOffsetReg()) // soffset		if (MFI->isEntryFunction()) {
.addImm(i * 4) // offset		MIB.addImm(0); // soffset
		} else {
		MIB.addReg(MFI->getStackPtrOffsetReg()); // soffset
		}
		MIB.addImm(i * 4) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);
}		}
}		}

MI->eraseFromParent();		MI->eraseFromParent();
MFI->addToSpilledSGPRs(NumSubRegs);		MFI->addToSpilledSGPRs(NumSubRegs);
return true;		return true;
}		}

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (SpillToVGPR) {

MachinePointerInfo PtrInfo		MachinePointerInfo PtrInfo
= MachinePointerInfo::getFixedStack(MF, Index, EltSize i);		= MachinePointerInfo::getFixedStack(MF, Index, EltSize i);

MachineMemOperand *MMO = MF->getMachineMemOperand(PtrInfo,		MachineMemOperand *MMO = MF->getMachineMemOperand(PtrInfo,
MachineMemOperand::MOLoad, EltSize,		MachineMemOperand::MOLoad, EltSize,
MinAlign(Align, EltSize * i));		MinAlign(Align, EltSize * i));

		auto MIB =
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::SI_SPILL_V32_RESTORE), TmpVGPR)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::SI_SPILL_V32_RESTORE), TmpVGPR)
.addFrameIndex(Index) // vaddr		.addFrameIndex(Index) // vaddr
.addReg(MFI->getScratchRSrcReg()) // srsrc		.addReg(MFI->getScratchRSrcReg()); // srsrc
.addReg(MFI->getStackPtrOffsetReg()) // soffset		if (MFI->isEntryFunction()) {
.addImm(i * 4) // offset		MIB.addImm(0); // soffset
		} else {
		MIB.addReg(MFI->getStackPtrOffsetReg()); // soffset
		}
		MIB.addImm(i * 4) // offset
.addMemOperand(MMO);		.addMemOperand(MMO);

auto MIB =		MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), SubReg)
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), SubReg)
.addReg(TmpVGPR, RegState::Kill);		.addReg(TmpVGPR, RegState::Kill);

if (NumSubRegs > 1)		if (NumSubRegs > 1)
MIB.addReg(MI->getOperand(0).getReg(), RegState::ImplicitDefine);		MIB.addReg(MI->getOperand(0).getReg(), RegState::ImplicitDefine);
}		}
}		}

MI->eraseFromParent();		MI->eraseFromParent();
return true;		return true;
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_V32_SAVE:		case AMDGPU::SI_SPILL_V32_SAVE:
case AMDGPU::SI_SPILL_A1024_SAVE:		case AMDGPU::SI_SPILL_A1024_SAVE:
case AMDGPU::SI_SPILL_A512_SAVE:		case AMDGPU::SI_SPILL_A512_SAVE:
case AMDGPU::SI_SPILL_A128_SAVE:		case AMDGPU::SI_SPILL_A128_SAVE:
case AMDGPU::SI_SPILL_A64_SAVE:		case AMDGPU::SI_SPILL_A64_SAVE:
case AMDGPU::SI_SPILL_A32_SAVE: {		case AMDGPU::SI_SPILL_A32_SAVE: {
const MachineOperand VData = TII->getNamedOperand(MI,		const MachineOperand VData = TII->getNamedOperand(MI,
AMDGPU::OpName::vdata);		AMDGPU::OpName::vdata);
assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==		assert(MFI->isEntryFunction() \|\|
		TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==
MFI->getStackPtrOffsetReg());		MFI->getStackPtrOffsetReg());
		assert(!MFI->isEntryFunction() \|\|
		TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getImm() == 0);

buildSpillLoadStore(MI, AMDGPU::BUFFER_STORE_DWORD_OFFSET,		buildSpillLoadStore(MI, AMDGPU::BUFFER_STORE_DWORD_OFFSET,
Index,		Index,
VData->getReg(), VData->isKill(),		VData->getReg(), VData->isKill(),
TII->getNamedOperand(*MI, AMDGPU::OpName::srsrc)->getReg(),		TII->getNamedOperand(*MI, AMDGPU::OpName::srsrc)->getReg(),
FrameReg,		FrameReg,
TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),		TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),
*MI->memoperands_begin(),		*MI->memoperands_begin(),
Show All 12 Lines	switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_V1024_RESTORE:		case AMDGPU::SI_SPILL_V1024_RESTORE:
case AMDGPU::SI_SPILL_A32_RESTORE:		case AMDGPU::SI_SPILL_A32_RESTORE:
case AMDGPU::SI_SPILL_A64_RESTORE:		case AMDGPU::SI_SPILL_A64_RESTORE:
case AMDGPU::SI_SPILL_A128_RESTORE:		case AMDGPU::SI_SPILL_A128_RESTORE:
case AMDGPU::SI_SPILL_A512_RESTORE:		case AMDGPU::SI_SPILL_A512_RESTORE:
case AMDGPU::SI_SPILL_A1024_RESTORE: {		case AMDGPU::SI_SPILL_A1024_RESTORE: {
const MachineOperand VData = TII->getNamedOperand(MI,		const MachineOperand VData = TII->getNamedOperand(MI,
AMDGPU::OpName::vdata);		AMDGPU::OpName::vdata);
assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==		assert(MFI->isEntryFunction() \|\|
		TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==
MFI->getStackPtrOffsetReg());		MFI->getStackPtrOffsetReg());
		assert(!MFI->isEntryFunction() \|\|
		TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getImm() == 0);

buildSpillLoadStore(MI, AMDGPU::BUFFER_LOAD_DWORD_OFFSET,		buildSpillLoadStore(MI, AMDGPU::BUFFER_LOAD_DWORD_OFFSET,
Index,		Index,
VData->getReg(), VData->isKill(),		VData->getReg(), VData->isKill(),
TII->getNamedOperand(*MI, AMDGPU::OpName::srsrc)->getReg(),		TII->getNamedOperand(*MI, AMDGPU::OpName::srsrc)->getReg(),
FrameReg,		FrameReg,
TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),		TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),
*MI->memoperands_begin(),		*MI->memoperands_begin(),
RS);		RS);
MI->eraseFromParent();		MI->eraseFromParent();
break;		break;
}		}

default: {		default: {
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();
bool IsMUBUF = TII->isMUBUF(*MI);		bool IsMUBUF = TII->isMUBUF(*MI);

if (!IsMUBUF && !MFI->isEntryFunction()) {		if (!IsMUBUF && !MFI->isEntryFunction()) {
// Convert to an absolute stack address by finding the offset from the		// Convert to a swizzled stack address by scaling by the wave size.
// scratch wave base and scaling by the wave size.
//		//
// In an entry function/kernel the offset is already the absolute		// In an entry function/kernel the offset is already swizzled.
// address relative to the frame register.

Register TmpDiffReg =
RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0, false);

// If there's no free SGPR, in-place modify the FP
Register DiffReg = TmpDiffReg.isValid() ? TmpDiffReg : FrameReg;

bool IsCopy = MI->getOpcode() == AMDGPU::V_MOV_B32_e32;		bool IsCopy = MI->getOpcode() == AMDGPU::V_MOV_B32_e32;
Register ResultReg = IsCopy ?		Register ResultReg =
MI->getOperand(0).getReg() :		IsCopy ? MI->getOperand(0).getReg()
RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MI, 0);		: RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MI, 0);

BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_SUB_U32), DiffReg)
.addReg(FrameReg)
.addReg(MFI->getScratchWaveOffsetReg());

int64_t Offset = FrameInfo.getObjectOffset(Index);		int64_t Offset = FrameInfo.getObjectOffset(Index);
if (Offset == 0) {		if (Offset == 0) {
// XXX - This never happens because of emergency scavenging slot at 0?		// XXX - This never happens because of emergency scavenging slot at 0?
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64), ResultReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64), ResultReg)
.addImm(ST.getWavefrontSizeLog2())		.addImm(ST.getWavefrontSizeLog2())
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - .addImm(ST.getWavefrontSizeLog2()) - .addReg(FrameReg); + .addImm(ST.getWavefrontSizeLog2()) + .addReg(FrameReg); Lint: Pre-merge checks: clang-format: please reformat the code ``` - .addImm(ST.getWavefrontSizeLog2())…
.addReg(DiffReg);		.addReg(FrameReg);
} else {		} else {
if (auto MIB = TII->getAddNoCarry(MBB, MI, DL, ResultReg, RS)) {		if (auto MIB = TII->getAddNoCarry(MBB, MI, DL, ResultReg, RS)) {
Register ScaledReg =		Register ScaledReg =
RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MIB, 0);		RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MIB, 0);

BuildMI(MBB, MIB, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64),		BuildMI(MBB, MIB, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64),
ScaledReg)		ScaledReg)
.addImm(ST.getWavefrontSizeLog2())		.addImm(ST.getWavefrontSizeLog2())
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - .addImm(ST.getWavefrontSizeLog2()) - .addReg(FrameReg); + .addImm(ST.getWavefrontSizeLog2()) + .addReg(FrameReg); Lint: Pre-merge checks: clang-format: please reformat the code ``` - .addImm(ST.getWavefrontSizeLog2())…
.addReg(DiffReg, RegState::Kill);		.addReg(FrameReg);

const bool IsVOP2 = MIB->getOpcode() == AMDGPU::V_ADD_U32_e32;		const bool IsVOP2 = MIB->getOpcode() == AMDGPU::V_ADD_U32_e32;

// TODO: Fold if use instruction is another add of a constant.		// TODO: Fold if use instruction is another add of a constant.
if (IsVOP2 \|\| AMDGPU::isInlinableLiteral32(Offset, ST.hasInv2PiInlineImm())) {		if (IsVOP2 \|\| AMDGPU::isInlinableLiteral32(Offset, ST.hasInv2PiInlineImm())) {
// FIXME: This can fail		// FIXME: This can fail
MIB.addImm(Offset);		MIB.addImm(Offset);
MIB.addReg(ScaledReg, RegState::Kill);		MIB.addReg(ScaledReg, RegState::Kill);
Show All 20 Lines	default: {
// We have to produce a carry out, and there isn't a free SGPR pair		// We have to produce a carry out, and there isn't a free SGPR pair
// for it. We can keep the whole computation on the SALU to avoid		// for it. We can keep the whole computation on the SALU to avoid
// clobbering an additional register at the cost of an extra mov.		// clobbering an additional register at the cost of an extra mov.

// We may have 1 free scratch SGPR even though a carry out is		// We may have 1 free scratch SGPR even though a carry out is
// unavailable. Only one additional mov is needed.		// unavailable. Only one additional mov is needed.
Register TmpScaledReg =		Register TmpScaledReg =
RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0, false);		RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0, false);
Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : DiffReg;		Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : FrameReg;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : FrameReg; + Register ScaledReg = + TmpScaledReg.isValid() ? TmpScaledReg : FrameReg; Lint: Pre-merge checks: clang-format: please reformat the code ``` - Register ScaledReg = TmpScaledReg.

BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHR_B32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHR_B32), ScaledReg)
.addReg(DiffReg, RegState::Kill)		.addReg(FrameReg)
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - .addReg(FrameReg) - .addImm(ST.getWavefrontSizeLog2()); + .addReg(FrameReg) + .addImm(ST.getWavefrontSizeLog2()); Lint: Pre-merge checks: clang-format: please reformat the code ``` - .addReg(FrameReg) - .
.addImm(ST.getWavefrontSizeLog2());		.addImm(ST.getWavefrontSizeLog2());
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), ScaledReg)
.addReg(ScaledReg, RegState::Kill)		.addReg(ScaledReg, RegState::Kill)
.addImm(Offset);		.addImm(Offset);
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), ResultReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), ResultReg)
.addReg(ScaledReg, RegState::Kill);		.addReg(ScaledReg, RegState::Kill);

// If there were truly no free SGPRs, we need to undo everything.		// If there were truly no free SGPRs, we need to undo everything.
if (!TmpScaledReg.isValid()) {		if (!TmpScaledReg.isValid()) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_SUB_U32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_SUB_U32), ScaledReg)
.addReg(ScaledReg, RegState::Kill)		.addReg(ScaledReg, RegState::Kill)
.addImm(Offset);		.addImm(Offset);
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHL_B32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHL_B32), ScaledReg)
.addReg(DiffReg, RegState::Kill)		.addReg(FrameReg)
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - .addReg(FrameReg) - .addImm(ST.getWavefrontSizeLog2()); + .addReg(FrameReg) + .addImm(ST.getWavefrontSizeLog2()); Lint: Pre-merge checks: clang-format: please reformat the code ``` - .addReg(FrameReg) - .
.addImm(ST.getWavefrontSizeLog2());		.addImm(ST.getWavefrontSizeLog2());
}		}
}		}
}		}

if (!TmpDiffReg.isValid()) {
// Restore the FP.
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), FrameReg)
.addReg(FrameReg)
.addReg(MFI->getScratchWaveOffsetReg());
}

// Don't introduce an extra copy if we're just materializing in a mov.		// Don't introduce an extra copy if we're just materializing in a mov.
if (IsCopy)		if (IsCopy)
MI->eraseFromParent();		MI->eraseFromParent();
else		else
FIOp.ChangeToRegister(ResultReg, false, false, true);		FIOp.ChangeToRegister(ResultReg, false, false, true);
return;		return;
}		}

if (IsMUBUF) {		if (IsMUBUF) {
// Disable offen so we don't need a 0 vgpr base.		// Disable offen so we don't need a 0 vgpr base.
assert(static_cast<int>(FIOperandNum) ==		assert(static_cast<int>(FIOperandNum) ==
AMDGPU::getNamedOperandIdx(MI->getOpcode(),		AMDGPU::getNamedOperandIdx(MI->getOpcode(),
AMDGPU::OpName::vaddr));		AMDGPU::OpName::vaddr));

assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==		auto &SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);
MFI->getStackPtrOffsetReg());		if (SOffset.isReg()) {
		assert(SOffset.getReg() == MFI->getStackPtrOffsetReg());
TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->setReg(FrameReg);		SOffset.setReg(FrameReg);
		}

int64_t Offset = FrameInfo.getObjectOffset(Index);		int64_t Offset = FrameInfo.getObjectOffset(Index);
int64_t OldImm		int64_t OldImm
= TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm();		= TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm();
int64_t NewOffset = OldImm + Offset;		int64_t NewOffset = OldImm + Offset;

if (isUInt<12>(NewOffset) &&		if (isUInt<12>(NewOffset) &&
buildMUBUFOffsetLoadStore(ST, FrameInfo, MI, Index, NewOffset)) {		buildMUBUFOffsetLoadStore(ST, FrameInfo, MI, Index, NewOffset)) {
▲ Show 20 Lines • Show All 706 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.td

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	def VCC_LO : SIReg<"vcc_lo", 106>;			def VCC_LO : SIReg<"vcc_lo", 106>;
	def VCC_HI : SIReg<"vcc_hi", 107>;			def VCC_HI : SIReg<"vcc_hi", 107>;

	// Pseudo-registers: Used as placeholders during isel and immediately			// Pseudo-registers: Used as placeholders during isel and immediately
	// replaced, never seeing the verifier.			// replaced, never seeing the verifier.
	def PRIVATE_RSRC_REG : SIReg<"private_rsrc", 0>;			def PRIVATE_RSRC_REG : SIReg<"private_rsrc", 0>;
	def FP_REG : SIReg<"fp", 0>;			def FP_REG : SIReg<"fp", 0>;
	def SP_REG : SIReg<"sp", 0>;			def SP_REG : SIReg<"sp", 0>;
	def SCRATCH_WAVE_OFFSET_REG : SIReg<"scratch_wave_offset", 0>;

	// Pseudo-register to represent the program-counter DWARF register.			// Pseudo-register to represent the program-counter DWARF register.
	def PC_REG : SIReg<"pc", 0>, DwarfRegNum<[16]> {			def PC_REG : SIReg<"pc", 0>, DwarfRegNum<[16]> {
	// There is no physical register corresponding to a "program counter", but			// There is no physical register corresponding to a "program counter", but
	// we need to encode the concept in debug information in order to represent			// we need to encode the concept in debug information in order to represent
	// things like the return value in unwind information.			// things like the return value in unwind information.
	let isArtificial = 1;			let isArtificial = 1;
	}			}
	▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines
	// AGPR 1024-bit registers			// AGPR 1024-bit registers
	def AGPR_1024 : SIRegisterTuples<getSubRegs<32>.ret, AGPR_32, 255, 1, 32, "a">;			def AGPR_1024 : SIRegisterTuples<getSubRegs<32>.ret, AGPR_32, 255, 1, 32, "a">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Register classes used as source and destination			// Register classes used as source and destination
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def Pseudo_SReg_32 : RegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16], 32,			def Pseudo_SReg_32 : RegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16], 32,
	(add FP_REG, SP_REG, SCRATCH_WAVE_OFFSET_REG)> {			(add FP_REG, SP_REG)> {
	let isAllocatable = 0;			let isAllocatable = 0;
	let CopyCost = -1;			let CopyCost = -1;
	}			}

	def Pseudo_SReg_128 : RegisterClass<"AMDGPU", [v4i32, v2i64, v2f64], 32,			def Pseudo_SReg_128 : RegisterClass<"AMDGPU", [v4i32, v2i64, v2f64], 32,
	(add PRIVATE_RSRC_REG)> {			(add PRIVATE_RSRC_REG)> {
	let isAllocatable = 0;			let isAllocatable = 0;
	let CopyCost = -1;			let CopyCost = -1;
	▲ Show 20 Lines • Show All 466 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_mov_b32 s4, 0			; CHECK-NEXT: s_mov_b32 s4, 0
	; CHECK-NEXT: ; %bb.3: ; %bb8			; CHECK-NEXT: ; %bb.3: ; %bb8
	; CHECK-NEXT: s_or_b64 exec, exec, s[6:7]			; CHECK-NEXT: s_or_b64 exec, exec, s[6:7]
	; CHECK-NEXT: v_cmp_eq_u32_e64 s[6:7], s4, 0			; CHECK-NEXT: v_cmp_eq_u32_e64 s[6:7], s4, 0
	; CHECK-NEXT: s_and_saveexec_b64 s[4:5], s[6:7]			; CHECK-NEXT: s_and_saveexec_b64 s[4:5], s[6:7]
	; CHECK-NEXT: s_cbranch_execz BB4_5			; CHECK-NEXT: s_cbranch_execz BB4_5
	; CHECK-NEXT: ; %bb.4: ; %bb11			; CHECK-NEXT: ; %bb.4: ; %bb11
	; CHECK-NEXT: v_mov_b32_e32 v0, 4.0			; CHECK-NEXT: v_mov_b32_e32 v0, 4.0
	; CHECK-NEXT: buffer_store_dword v0, v0, s[0:3], s33 offen			; CHECK-NEXT: buffer_store_dword v0, v0, s[0:3], 0 offen
	; CHECK-NEXT: BB4_5: ; %Flow			; CHECK-NEXT: BB4_5: ; %Flow
	; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]			; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
	; CHECK-NEXT: BB4_6: ; %bb12			; CHECK-NEXT: BB4_6: ; %bb12
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%tmp = load i32, i32 addrspace(4)* @external_constant			%tmp = load i32, i32 addrspace(4)* @external_constant
	%ptr = load float, float addrspace(4)* @const.ptr			%ptr = load float, float addrspace(4)* @const.ptr
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir

	Show All 10 Lines
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX7-LABEL: name: load_local_s32_from_4			; GFX7-LABEL: name: load_local_s32_from_4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)			; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]			; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX9-LABEL: name: load_local_s32_from_4			; GFX9-LABEL: name: load_local_s32_from_4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)			; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 3)			%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_s32_from_2			name: load_local_s32_from_2
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_2
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U16_:%[0-9]+]]:vgpr_32 = DS_READ_U16 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 2, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U16_]]
	; GFX7-LABEL: name: load_local_s32_from_2			; GFX7-LABEL: name: load_local_s32_from_2
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U16_:%[0-9]+]]:vgpr_32 = DS_READ_U16 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 2, addrspace 3)			; GFX7: [[DS_READ_U16_:%[0-9]+]]:vgpr_32 = DS_READ_U16 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 2, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U16_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U16_]]
	; GFX9-LABEL: name: load_local_s32_from_2			; GFX9-LABEL: name: load_local_s32_from_2
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_U16_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U16_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 2, addrspace 3)			; GFX9: [[DS_READ_U16_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U16_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 2, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U16_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U16_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_2
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U16_:%[0-9]+]]:vgpr_32 = DS_READ_U16 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 2, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U16_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 2, align 2, addrspace 3)			%1:vgpr(s32) = G_LOAD %0 :: (load 2, align 2, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_s32_from_1			name: load_local_s32_from_1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1			; GFX7-LABEL: name: load_local_s32_from_1
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1			; GFX9-LABEL: name: load_local_s32_from_1
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 3)			%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_v2s32			name: load_local_v2s32
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v2s32
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX7-LABEL: name: load_local_v2s32			; GFX7-LABEL: name: load_local_v2s32
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)			; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX9-LABEL: name: load_local_v2s32			; GFX9-LABEL: name: load_local_v2s32
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)			; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]
				; GFX6-LABEL: name: load_local_v2s32
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_v2s32_align4			name: load_local_v2s32_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v2s32_align4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x s32>) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x s32>)
	; GFX7-LABEL: name: load_local_v2s32_align4			; GFX7-LABEL: name: load_local_v2s32_align4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_v2s32_align4			; GFX9-LABEL: name: load_local_v2s32_align4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_v2s32_align4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x s32>) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x s32>)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 4, addrspace 3)			%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_s64			name: load_local_s64
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s64
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX7-LABEL: name: load_local_s64			; GFX7-LABEL: name: load_local_s64
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)			; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX9-LABEL: name: load_local_s64			; GFX9-LABEL: name: load_local_s64
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)			; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]
				; GFX6-LABEL: name: load_local_s64
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_s64_align4			name: load_local_s64_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s64_align4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	; GFX7-LABEL: name: load_local_s64_align4			; GFX7-LABEL: name: load_local_s64_align4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_s64_align4			; GFX9-LABEL: name: load_local_s64_align4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_s64_align4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 4, addrspace 3)			%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_p3_from_4			name: load_local_p3_from_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p3_from_4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX7-LABEL: name: load_local_p3_from_4			; GFX7-LABEL: name: load_local_p3_from_4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)			; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]			; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX9-LABEL: name: load_local_p3_from_4			; GFX9-LABEL: name: load_local_p3_from_4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)			; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_p3_from_4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p3) = G_LOAD %0 :: (load 4, align 4, addrspace 3)			%1:vgpr(p3) = G_LOAD %0 :: (load 4, align 4, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_p5_from_4			name: load_local_p5_from_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p5_from_4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX7-LABEL: name: load_local_p5_from_4			; GFX7-LABEL: name: load_local_p5_from_4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)			; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]			; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX9-LABEL: name: load_local_p5_from_4			; GFX9-LABEL: name: load_local_p5_from_4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)			; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_p5_from_4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p3) = G_LOAD %0 :: (load 4, align 4, addrspace 3)			%1:vgpr(p3) = G_LOAD %0 :: (load 4, align 4, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_p1_align8			name: load_local_p1_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p1_align8
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX7-LABEL: name: load_local_p1_align8			; GFX7-LABEL: name: load_local_p1_align8
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)			; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX9-LABEL: name: load_local_p1_align8			; GFX9-LABEL: name: load_local_p1_align8
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)			; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]
				; GFX6-LABEL: name: load_local_p1_align8
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p1) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(p1) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_p1_align4			name: load_local_p1_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p1_align4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p1) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p1)
	; GFX7-LABEL: name: load_local_p1_align4			; GFX7-LABEL: name: load_local_p1_align4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_p1_align4			; GFX9-LABEL: name: load_local_p1_align4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_p1_align4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p1) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p1)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p1) = G_LOAD %0 :: (load 8, align 4, addrspace 3)			%1:vgpr(p1) = G_LOAD %0 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_p999_from_8			name: load_local_p999_from_8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p999_from_8
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
	; GFX7-LABEL: name: load_local_p999_from_8			; GFX7-LABEL: name: load_local_p999_from_8
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)			; GFX7: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[LOAD]](p999)			; GFX7: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
	; GFX9-LABEL: name: load_local_p999_from_8			; GFX9-LABEL: name: load_local_p999_from_8
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX9: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)			; GFX9: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](p999)			; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
				; GFX6-LABEL: name: load_local_p999_from_8
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p999) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(p999) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_v2p3			name: load_local_v2p3
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v2p3
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
	; GFX7-LABEL: name: load_local_v2p3			; GFX7-LABEL: name: load_local_v2p3
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)			; GFX7: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)			; GFX7: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
	; GFX9-LABEL: name: load_local_v2p3			; GFX9-LABEL: name: load_local_v2p3
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX9: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)			; GFX9: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)			; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
				; GFX6-LABEL: name: load_local_v2p3
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<2 x p3>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(<2 x p3>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_v2s16			name: load_local_v2s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v2s16
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX7-LABEL: name: load_local_v2s16			; GFX7-LABEL: name: load_local_v2s16
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)			; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]			; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX9-LABEL: name: load_local_v2s16			; GFX9-LABEL: name: load_local_v2s16
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)			; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_v2s16
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<2 x s16>) = G_LOAD %0 :: (load 4, align 4, addrspace 3)			%1:vgpr(<2 x s16>) = G_LOAD %0 :: (load 4, align 4, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_v4s16			name: load_local_v4s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v4s16
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX7-LABEL: name: load_local_v4s16			; GFX7-LABEL: name: load_local_v4s16
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)			; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX9-LABEL: name: load_local_v4s16			; GFX9-LABEL: name: load_local_v4s16
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)			; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]
				; GFX6-LABEL: name: load_local_v4s16
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<4 x s16>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(<4 x s16>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	# ---			# ---

	# name: load_local_v6s16			# name: load_local_v6s16
	# legalized: true			# legalized: true
	# regBankSelected: true			# regBankSelected: true
	# tracksRegLiveness: true			# tracksRegLiveness: true
	# machineFunctionInfo:			# machineFunctionInfo:
	# scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			# scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	# scratchWaveOffsetReg: $sgpr4
	# stackPtrOffsetReg: $sgpr32			# stackPtrOffsetReg: $sgpr32

	# body: \|			# body: \|
	# bb.0:			# bb.0:
	# liveins: $vgpr0			# liveins: $vgpr0

	# %0:vgpr(p3) = COPY $vgpr0			# %0:vgpr(p3) = COPY $vgpr0
	# %1:vgpr(<6 x s16>) = G_LOAD %0 :: (load 12, align 4, addrspace 3)			# %1:vgpr(<6 x s16>) = G_LOAD %0 :: (load 12, align 4, addrspace 3)
	Show All 11 Lines
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1_gep_65535
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65535, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1_gep_65535			; GFX7-LABEL: name: load_local_s32_from_1_gep_65535
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1_gep_65535			; GFX9-LABEL: name: load_local_s32_from_1_gep_65535
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[COPY]], 65535, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[COPY]], 65535, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1_gep_65535
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65535, implicit $exec
				; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 65535			%1:vgpr(s32) = G_CONSTANT i32 65535
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_local_s32_from_1_gep_65535_known_bits_base_address			name: load_local_s32_from_1_gep_65535_known_bits_base_address
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX6: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[V_AND_B32_e64_]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address			; GFX7-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX7: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec			; GFX7: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[V_AND_B32_e64_]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[V_AND_B32_e64_]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address			; GFX9-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX9: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec			; GFX9: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_AND_B32_e64_]], 65535, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_AND_B32_e64_]], 65535, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
				; GFX6: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[V_AND_B32_e64_]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 2147483647			%1:vgpr(s32) = G_CONSTANT i32 2147483647
	%2:vgpr(s32) = G_AND %0, %1			%2:vgpr(s32) = G_AND %0, %1
	%3:vgpr(p3) = G_INTTOPTR %2			%3:vgpr(p3) = G_INTTOPTR %2
	%4:vgpr(s32) = G_CONSTANT i32 65535			%4:vgpr(s32) = G_CONSTANT i32 65535
	%5:vgpr(p3) = G_PTR_ADD %3, %4			%5:vgpr(p3) = G_PTR_ADD %3, %4
	%6:vgpr(s32) = G_LOAD %5 :: (load 1, align 1, addrspace 3)			%6:vgpr(s32) = G_LOAD %5 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %6			$vgpr0 = COPY %6

	...			...

	---			---

	name: load_local_s32_from_1_gep_65536			name: load_local_s32_from_1_gep_65536
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1_gep_65536
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1_gep_65536			; GFX7-LABEL: name: load_local_s32_from_1_gep_65536
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec
	; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1_gep_65536			; GFX9-LABEL: name: load_local_s32_from_1_gep_65536
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_ADD_U32_e64_]], 0, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_ADD_U32_e64_]], 0, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1_gep_65536
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec
				; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 65536			%1:vgpr(s32) = G_CONSTANT i32 65536
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_local_s32_from_1_gep_m1			name: load_local_s32_from_1_gep_m1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1_gep_m1
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1_gep_m1			; GFX7-LABEL: name: load_local_s32_from_1_gep_m1
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec
	; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1_gep_m1			; GFX9-LABEL: name: load_local_s32_from_1_gep_m1
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_ADD_U32_e64_]], 0, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_ADD_U32_e64_]], 0, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1_gep_m1
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec
				; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -1			%1:vgpr(s32) = G_CONSTANT i32 -1
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_local_s64_align4_from_1_gep_1016			name: load_local_s64_align4_from_1_gep_1016
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1			liveins: $vgpr0_vgpr1

	; GFX6-LABEL: name: load_local_s64_align4_from_1_gep_1016
	; GFX6: liveins: $vgpr0_vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1016
	; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY]], [[C]](s32)
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[PTR_ADD]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	; GFX7-LABEL: name: load_local_s64_align4_from_1_gep_1016			; GFX7-LABEL: name: load_local_s64_align4_from_1_gep_1016
	; GFX7: liveins: $vgpr0_vgpr1			; GFX7: liveins: $vgpr0_vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 254, 255, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 254, 255, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_s64_align4_from_1_gep_1016			; GFX9-LABEL: name: load_local_s64_align4_from_1_gep_1016
	; GFX9: liveins: $vgpr0_vgpr1			; GFX9: liveins: $vgpr0_vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 254, 255, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 254, 255, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_s64_align4_from_1_gep_1016
				; GFX6: liveins: $vgpr0_vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1016
				; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY]], [[C]](s32)
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[PTR_ADD]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 1016			%1:vgpr(s32) = G_CONSTANT i32 1016
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s64) = G_LOAD %2 :: (load 8, align 4, addrspace 3)			%3:vgpr(s64) = G_LOAD %2 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %3			$vgpr0_vgpr1 = COPY %3

	...			...

	---			---

	name: load_local_s64_align4_from_1_gep_1020			name: load_local_s64_align4_from_1_gep_1020
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1			liveins: $vgpr0_vgpr1

	; GFX6-LABEL: name: load_local_s64_align4_from_1_gep_1020
	; GFX6: liveins: $vgpr0_vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1020
	; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY]], [[C]](s32)
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[PTR_ADD]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	; GFX7-LABEL: name: load_local_s64_align4_from_1_gep_1020			; GFX7-LABEL: name: load_local_s64_align4_from_1_gep_1020
	; GFX7: liveins: $vgpr0_vgpr1			; GFX7: liveins: $vgpr0_vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec
	; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 %2, 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 %2, 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_s64_align4_from_1_gep_1020			; GFX9-LABEL: name: load_local_s64_align4_from_1_gep_1020
	; GFX9: liveins: $vgpr0_vgpr1			; GFX9: liveins: $vgpr0_vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[V_ADD_U32_e64_]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[V_ADD_U32_e64_]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_s64_align4_from_1_gep_1020
				; GFX6: liveins: $vgpr0_vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1020
				; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY]], [[C]](s32)
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[PTR_ADD]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 1020			%1:vgpr(s32) = G_CONSTANT i32 1020
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s64) = G_LOAD %2 :: (load 8, align 4, addrspace 3)			%3:vgpr(s64) = G_LOAD %2 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %3			$vgpr0_vgpr1 = COPY %3

	...			...

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-private.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s			# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s
	# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s

	---			---

	name: load_private_s32_from_4			name: load_private_s32_from_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_4			; GFX6-LABEL: name: load_private_s32_from_4
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX6: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_4			; GFX9-LABEL: name: load_private_s32_from_4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX9: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_2			name: load_private_s32_from_2
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_2			; GFX6-LABEL: name: load_private_s32_from_2
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[BUFFER_LOAD_USHORT_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_USHORT_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 2, addrspace 5)			; GFX6: [[BUFFER_LOAD_USHORT_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_USHORT_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 2, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_USHORT_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_USHORT_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_2			; GFX9-LABEL: name: load_private_s32_from_2
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_USHORT_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_USHORT_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 2, addrspace 5)			; GFX9: [[BUFFER_LOAD_USHORT_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_USHORT_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 2, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_USHORT_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_USHORT_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 2, align 2, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 2, align 2, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_1			name: load_private_s32_from_1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1			; GFX6-LABEL: name: load_private_s32_from_1
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1			; GFX9-LABEL: name: load_private_s32_from_1
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---
	Show All 26 Lines
	---			---

	name: load_private_p5_from_4			name: load_private_p5_from_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_p5_from_4			; GFX6-LABEL: name: load_private_p5_from_4
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	Show All 14 Lines
	---			---

	name: load_private_v2s16			name: load_private_v2s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_v2s16			; GFX6-LABEL: name: load_private_v2s16
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	Show All 18 Lines
	---			---

	name: load_private_s32_from_1_gep_2047			name: load_private_s32_from_1_gep_2047
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_2047			; GFX6-LABEL: name: load_private_s32_from_1_gep_2047
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2047, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2047, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_2047			; GFX9-LABEL: name: load_private_s32_from_1_gep_2047
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 2047			%1:vgpr(s32) = G_CONSTANT i32 2047
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_2047_known_bits			name: load_private_s32_from_1_gep_2047_known_bits
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_2047_known_bits			; GFX6-LABEL: name: load_private_s32_from_1_gep_2047_known_bits
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX6: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec			; GFX6: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_AND_B32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_AND_B32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_2047_known_bits			; GFX9-LABEL: name: load_private_s32_from_1_gep_2047_known_bits
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX9: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec			; GFX9: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_AND_B32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_AND_B32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 2147483647			%1:vgpr(s32) = G_CONSTANT i32 2147483647
	%2:vgpr(s32) = G_AND %0, %1			%2:vgpr(s32) = G_AND %0, %1
	%3:vgpr(p5) = G_INTTOPTR %2			%3:vgpr(p5) = G_INTTOPTR %2
	%4:vgpr(s32) = G_CONSTANT i32 2047			%4:vgpr(s32) = G_CONSTANT i32 2047
	%5:vgpr(p5) = G_PTR_ADD %3, %4			%5:vgpr(p5) = G_PTR_ADD %3, %4
	%6:vgpr(s32) = G_LOAD %5 :: (load 1, align 1, addrspace 5)			%6:vgpr(s32) = G_LOAD %5 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %6			$vgpr0 = COPY %6

	...			...

	---			---

	name: load_private_s32_from_1_gep_2048			name: load_private_s32_from_1_gep_2048
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_2048			; GFX6-LABEL: name: load_private_s32_from_1_gep_2048
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2048, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2048, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_2048			; GFX9-LABEL: name: load_private_s32_from_1_gep_2048
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 2048, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 2048, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 2048			%1:vgpr(s32) = G_CONSTANT i32 2048
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m2047			name: load_private_s32_from_1_gep_m2047
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m2047			; GFX6-LABEL: name: load_private_s32_from_1_gep_m2047
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965249, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965249, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m2047			; GFX9-LABEL: name: load_private_s32_from_1_gep_m2047
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965249, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965249, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -2047			%1:vgpr(s32) = G_CONSTANT i32 -2047
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m2048			name: load_private_s32_from_1_gep_m2048
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m2048			; GFX6-LABEL: name: load_private_s32_from_1_gep_m2048
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965248, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965248, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m2048			; GFX9-LABEL: name: load_private_s32_from_1_gep_m2048
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965248, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965248, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -2048			%1:vgpr(s32) = G_CONSTANT i32 -2048
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_4095			name: load_private_s32_from_1_gep_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_4095			; GFX6-LABEL: name: load_private_s32_from_1_gep_4095
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_4095			; GFX9-LABEL: name: load_private_s32_from_1_gep_4095
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 4095			%1:vgpr(s32) = G_CONSTANT i32 4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_4096			name: load_private_s32_from_1_gep_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_4096			; GFX6-LABEL: name: load_private_s32_from_1_gep_4096
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_4096			; GFX9-LABEL: name: load_private_s32_from_1_gep_4096
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 4096			%1:vgpr(s32) = G_CONSTANT i32 4096
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m4095			name: load_private_s32_from_1_gep_m4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m4095			; GFX6-LABEL: name: load_private_s32_from_1_gep_m4095
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963201, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963201, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m4095			; GFX9-LABEL: name: load_private_s32_from_1_gep_m4095
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963201, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963201, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -4095			%1:vgpr(s32) = G_CONSTANT i32 -4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m4096			name: load_private_s32_from_1_gep_m4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m4096			; GFX6-LABEL: name: load_private_s32_from_1_gep_m4096
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963200, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963200, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m4096			; GFX9-LABEL: name: load_private_s32_from_1_gep_m4096
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963200, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963200, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -4096			%1:vgpr(s32) = G_CONSTANT i32 -4096
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_8191			name: load_private_s32_from_1_gep_8191
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_8191			; GFX6-LABEL: name: load_private_s32_from_1_gep_8191
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8191, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8191, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_8191			; GFX9-LABEL: name: load_private_s32_from_1_gep_8191
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8191, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8191, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 8191			%1:vgpr(s32) = G_CONSTANT i32 8191
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_8192			name: load_private_s32_from_1_gep_8192
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_8192			; GFX6-LABEL: name: load_private_s32_from_1_gep_8192
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8192, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8192, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_8192			; GFX9-LABEL: name: load_private_s32_from_1_gep_8192
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8192, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8192, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 8192			%1:vgpr(s32) = G_CONSTANT i32 8192
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m8191			name: load_private_s32_from_1_gep_m8191
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m8191			; GFX6-LABEL: name: load_private_s32_from_1_gep_m8191
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959105, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959105, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m8191			; GFX9-LABEL: name: load_private_s32_from_1_gep_m8191
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959105, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959105, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -8191			%1:vgpr(s32) = G_CONSTANT i32 -8191
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m8192			name: load_private_s32_from_1_gep_m8192
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m8192			; GFX6-LABEL: name: load_private_s32_from_1_gep_m8192
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959104, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959104, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m8192			; GFX9-LABEL: name: load_private_s32_from_1_gep_m8192
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959104, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959104, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -8192			%1:vgpr(s32) = G_CONSTANT i32 -8192
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_4_constant_0			name: load_private_s32_from_4_constant_0
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_4_constant_0			; GFX6-LABEL: name: load_private_s32_from_4_constant_0
	; GFX6: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX6: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
	; GFX9-LABEL: name: load_private_s32_from_4_constant_0			; GFX9-LABEL: name: load_private_s32_from_4_constant_0
	; GFX9: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX9: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
	%0:vgpr(p5) = G_CONSTANT i32 0			%0:vgpr(p5) = G_CONSTANT i32 0
	%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_4_constant_sgpr_16			name: load_private_s32_from_4_constant_sgpr_16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_4_constant_sgpr_16			; GFX6-LABEL: name: load_private_s32_from_4_constant_sgpr_16
	; GFX6: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 16, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX6: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 16, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
	; GFX9-LABEL: name: load_private_s32_from_4_constant_sgpr_16			; GFX9-LABEL: name: load_private_s32_from_4_constant_sgpr_16
	; GFX9: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 16, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX9: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 16, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
	%0:sgpr(p5) = G_CONSTANT i32 16			%0:sgpr(p5) = G_CONSTANT i32 16
	%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_1_constant_4095			name: load_private_s32_from_1_constant_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_1_constant_4095			; GFX6-LABEL: name: load_private_s32_from_1_constant_4095
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFSET]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFSET]]
	; GFX9-LABEL: name: load_private_s32_from_1_constant_4095			; GFX9-LABEL: name: load_private_s32_from_1_constant_4095
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFSET]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFSET]]
	%0:vgpr(p5) = G_CONSTANT i32 4095			%0:vgpr(p5) = G_CONSTANT i32 4095
	%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_1_constant_4096			name: load_private_s32_from_1_constant_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_1_constant_4096			; GFX6-LABEL: name: load_private_s32_from_1_constant_4096
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_constant_4096			; GFX9-LABEL: name: load_private_s32_from_1_constant_4096
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = G_CONSTANT i32 4096			%0:vgpr(p5) = G_CONSTANT i32 4096
	%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_fi			name: load_private_s32_from_fi
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4, alignment: 4 }			- { id: 0, size: 4, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_fi			; GFX6-LABEL: name: load_private_s32_from_fi
	Show All 11 Lines
	---			---

	name: load_private_s32_from_1_fi_offset_4095			name: load_private_s32_from_1_fi_offset_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4095			; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4095			; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4095
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = G_FRAME_INDEX %stack.0			%0:vgpr(p5) = G_FRAME_INDEX %stack.0
	%1:vgpr(s32) = G_CONSTANT i32 4095			%1:vgpr(s32) = G_CONSTANT i32 4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_fi_offset_4096			name: load_private_s32_from_1_fi_offset_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 8192, alignment: 4 }			- { id: 0, size: 8192, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4096			; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4096
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4096			; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4096
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = G_FRAME_INDEX %stack.0			%0:vgpr(p5) = G_FRAME_INDEX %stack.0
	%1:vgpr(s32) = G_CONSTANT i32 4096			%1:vgpr(s32) = G_CONSTANT i32 4096
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-local.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s			# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s
	# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX7 %s			# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX7 %s
	# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX7 %s			# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX7 %s
	# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s
	# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s

	---			---

	name: store_local_s32_to_4			name: store_local_s32_to_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_s32_to_4
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_4			; GFX7-LABEL: name: store_local_s32_to_4
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)			; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_4			; GFX9-LABEL: name: store_local_s32_to_4
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)			; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_4
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 3)			G_STORE %0, %1 :: (store 4, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_s32_to_2			name: store_local_s32_to_2
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_s32_to_2
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B16 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 2, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_2			; GFX7-LABEL: name: store_local_s32_to_2
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B16 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 2, addrspace 3)			; GFX7: DS_WRITE_B16 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 2, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_2			; GFX9-LABEL: name: store_local_s32_to_2
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B16_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 2, addrspace 3)			; GFX9: DS_WRITE_B16_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 2, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_2
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B16 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 2, addrspace 3)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 2, align 2, addrspace 3)			G_STORE %0, %1 :: (store 2, align 2, addrspace 3)

	...			...

	---			---

	name: store_local_s32_to_1			name: store_local_s32_to_1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_s32_to_1
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B8 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_1			; GFX7-LABEL: name: store_local_s32_to_1
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B8 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)			; GFX7: DS_WRITE_B8 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_1			; GFX9-LABEL: name: store_local_s32_to_1
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B8_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 1, addrspace 3)			; GFX9: DS_WRITE_B8_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 1, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_1
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B8 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 1, align 1, addrspace 3)			G_STORE %0, %1 :: (store 1, align 1, addrspace 3)

	...			...

	---			---

	name: store_local_v2s16			name: store_local_v2s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_v2s16
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX7-LABEL: name: store_local_v2s16			; GFX7-LABEL: name: store_local_v2s16
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)			; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX9-LABEL: name: store_local_v2s16			; GFX9-LABEL: name: store_local_v2s16
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)			; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)
				; GFX6-LABEL: name: store_local_v2s16
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	%0:vgpr(<2 x s16>) = COPY $vgpr0			%0:vgpr(<2 x s16>) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 3)			G_STORE %0, %1 :: (store 4, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_p3			name: store_local_p3
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_p3
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX7-LABEL: name: store_local_p3			; GFX7-LABEL: name: store_local_p3
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)			; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX9-LABEL: name: store_local_p3			; GFX9-LABEL: name: store_local_p3
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)			; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)
				; GFX6-LABEL: name: store_local_p3
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 3)			G_STORE %0, %1 :: (store 4, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_s32_to_1_constant_4095			name: store_local_s32_to_1_constant_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_local_s32_to_1_constant_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_1_constant_4095			; GFX7-LABEL: name: store_local_s32_to_1_constant_4095
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX7: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX7: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)			; GFX7: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_1_constant_4095			; GFX9-LABEL: name: store_local_s32_to_1_constant_4095
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: DS_WRITE_B8_gfx9 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $exec :: (store 1, addrspace 3)			; GFX9: DS_WRITE_B8_gfx9 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $exec :: (store 1, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_1_constant_4095
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
				; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	%0:vgpr(p3) = G_CONSTANT i32 4095			%0:vgpr(p3) = G_CONSTANT i32 4095
	%1:vgpr(s32) = G_CONSTANT i32 0			%1:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %1, %0 :: (store 1, align 1, addrspace 3)			G_STORE %1, %0 :: (store 1, align 1, addrspace 3)

	...			...

	---			---

	name: store_local_s32_to_1_constant_4096			name: store_local_s32_to_1_constant_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_local_s32_to_1_constant_4096
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_1_constant_4096			; GFX7-LABEL: name: store_local_s32_to_1_constant_4096
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX7: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX7: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)			; GFX7: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_1_constant_4096			; GFX9-LABEL: name: store_local_s32_to_1_constant_4096
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: DS_WRITE_B8_gfx9 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $exec :: (store 1, addrspace 3)			; GFX9: DS_WRITE_B8_gfx9 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $exec :: (store 1, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_1_constant_4096
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
				; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	%0:vgpr(p3) = G_CONSTANT i32 4096			%0:vgpr(p3) = G_CONSTANT i32 4096
	%1:vgpr(s32) = G_CONSTANT i32 0			%1:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %1, %0 :: (store 1, align 1, addrspace 3)			G_STORE %1, %0 :: (store 1, align 1, addrspace 3)

	...			...

	---			---

	name: store_local_s64_align4			name: store_local_s64_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_s64_align4
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](s64), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_s64_align4			; GFX7-LABEL: name: store_local_s64_align4
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_s64_align4			; GFX9-LABEL: name: store_local_s64_align4
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_s64_align4
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](s64), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(s64) = COPY $vgpr0_vgpr1			%0:vgpr(s64) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 4, addrspace 3)			G_STORE %0, %1 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_p1_align4			name: store_local_p1_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_p1_align4
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p1) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](p1), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_p1_align4			; GFX7-LABEL: name: store_local_p1_align4
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_p1_align4			; GFX9-LABEL: name: store_local_p1_align4
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_p1_align4
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p1) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](p1), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(p1) = COPY $vgpr0_vgpr1			%0:vgpr(p1) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 4, addrspace 3)			G_STORE %0, %1 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_v2s32_align4			name: store_local_v2s32_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_v2s32_align4
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](<2 x s32>), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_v2s32_align4			; GFX7-LABEL: name: store_local_v2s32_align4
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_v2s32_align4			; GFX9-LABEL: name: store_local_v2s32_align4
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_v2s32_align4
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](<2 x s32>), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1			%0:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 4, addrspace 3)			G_STORE %0, %1 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_v4s16_align4			name: store_local_v4s16_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_v4s16_align4
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](<4 x s16>), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_v4s16_align4			; GFX7-LABEL: name: store_local_v4s16_align4
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_v4s16_align4			; GFX9-LABEL: name: store_local_v4s16_align4
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_v4s16_align4
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](<4 x s16>), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1			%0:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 4, addrspace 3)			G_STORE %0, %1 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_s64_align8			name: store_local_s64_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_s64_align8
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX7-LABEL: name: store_local_s64_align8			; GFX7-LABEL: name: store_local_s64_align8
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)			; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX9-LABEL: name: store_local_s64_align8			; GFX9-LABEL: name: store_local_s64_align8
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)			; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)
				; GFX6-LABEL: name: store_local_s64_align8
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	%0:vgpr(s64) = COPY $vgpr0_vgpr1			%0:vgpr(s64) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 8, addrspace 3)			G_STORE %0, %1 :: (store 8, align 8, addrspace 3)

	...			...

	---			---

	name: store_local_p1_align8			name: store_local_p1_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_p1_align8
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX7-LABEL: name: store_local_p1_align8			; GFX7-LABEL: name: store_local_p1_align8
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)			; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX9-LABEL: name: store_local_p1_align8			; GFX9-LABEL: name: store_local_p1_align8
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)			; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)
				; GFX6-LABEL: name: store_local_p1_align8
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	%0:vgpr(p1) = COPY $vgpr0_vgpr1			%0:vgpr(p1) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 8, addrspace 3)			G_STORE %0, %1 :: (store 8, align 8, addrspace 3)

	...			...

	---			---

	name: store_local_v2s32_align8			name: store_local_v2s32_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_v2s32_align8
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX7-LABEL: name: store_local_v2s32_align8			; GFX7-LABEL: name: store_local_v2s32_align8
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)			; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX9-LABEL: name: store_local_v2s32_align8			; GFX9-LABEL: name: store_local_v2s32_align8
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)			; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)
				; GFX6-LABEL: name: store_local_v2s32_align8
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	%0:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1			%0:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 8, addrspace 3)			G_STORE %0, %1 :: (store 8, align 8, addrspace 3)

	...			...

	---			---

	name: store_local_v4s16_align8			name: store_local_v4s16_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_v4s16_align8
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX7-LABEL: name: store_local_v4s16_align8			; GFX7-LABEL: name: store_local_v4s16_align8
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)			; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX9-LABEL: name: store_local_v4s16_align8			; GFX9-LABEL: name: store_local_v4s16_align8
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)			; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)
				; GFX6-LABEL: name: store_local_v4s16_align8
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	%0:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1			%0:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 8, addrspace 3)			G_STORE %0, %1 :: (store 8, align 8, addrspace 3)

	...			...

	---			---

	name: store_local_s64_align4_from_1_gep_1016			name: store_local_s64_align4_from_1_gep_1016
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_s64_align4_from_1_gep_1016
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1016
	; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY1]], [[C]](s32)
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](s64), [[PTR_ADD]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_s64_align4_from_1_gep_1016			; GFX7-LABEL: name: store_local_s64_align4_from_1_gep_1016
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 254, 255, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 254, 255, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_s64_align4_from_1_gep_1016			; GFX9-LABEL: name: store_local_s64_align4_from_1_gep_1016
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 254, 255, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 254, 255, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_s64_align4_from_1_gep_1016
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1016
				; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY1]], [[C]](s32)
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](s64), [[PTR_ADD]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(s64) = COPY $vgpr0_vgpr1			%0:vgpr(s64) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	%2:vgpr(s32) = G_CONSTANT i32 1016			%2:vgpr(s32) = G_CONSTANT i32 1016
	%3:vgpr(p3) = G_PTR_ADD %1, %2			%3:vgpr(p3) = G_PTR_ADD %1, %2
	G_STORE %0, %3 :: (store 8, align 4, addrspace 3)			G_STORE %0, %3 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_s64_align4_from_1_gep_1020			name: store_local_s64_align4_from_1_gep_1020
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_s64_align4_from_1_gep_1020
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1020
	; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY1]], [[C]](s32)
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](s64), [[PTR_ADD]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_s64_align4_from_1_gep_1020			; GFX7-LABEL: name: store_local_s64_align4_from_1_gep_1020
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec
	; GFX7: %3:vgpr_32, dead %6:sreg_64_xexec = V_ADD_I32_e64 [[COPY1]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX7: %3:vgpr_32, dead %6:sreg_64_xexec = V_ADD_I32_e64 [[COPY1]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 %3, [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 %3, [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_s64_align4_from_1_gep_1020			; GFX9-LABEL: name: store_local_s64_align4_from_1_gep_1020
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY1]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY1]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[V_ADD_U32_e64_]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[V_ADD_U32_e64_]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_s64_align4_from_1_gep_1020
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1020
				; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY1]], [[C]](s32)
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](s64), [[PTR_ADD]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(s64) = COPY $vgpr0_vgpr1			%0:vgpr(s64) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	%2:vgpr(s32) = G_CONSTANT i32 1020			%2:vgpr(s32) = G_CONSTANT i32 1020
	%3:vgpr(p3) = G_PTR_ADD %1, %2			%3:vgpr(p3) = G_PTR_ADD %1, %2
	G_STORE %0, %3 :: (store 8, align 4, addrspace 3)			G_STORE %0, %3 :: (store 8, align 4, addrspace 3)

	...			...

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-private.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s			# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s
	# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s

	---			---

	name: store_private_s32_to_4			name: function_store_private_s32_to_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_s32_to_4			; GFX6-LABEL: name: function_store_private_s32_to_4
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_4			; GFX9-LABEL: name: function_store_private_s32_to_4
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 5)			G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_2			name: function_store_private_s32_to_2
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_s32_to_2			; GFX6-LABEL: name: function_store_private_s32_to_2
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)			; GFX6: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_2			; GFX9-LABEL: name: function_store_private_s32_to_2
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)			; GFX9: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 2, align 2, addrspace 5)			G_STORE %0, %1 :: (store 2, align 2, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_1			name: function_store_private_s32_to_1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_s32_to_1			; GFX6-LABEL: name: function_store_private_s32_to_1
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_1			; GFX9-LABEL: name: function_store_private_s32_to_1
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 1, align 1, addrspace 5)			G_STORE %0, %1 :: (store 1, align 1, addrspace 5)

	...			...

	---			---

	name: store_private_v2s16			name: function_store_private_v2s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_v2s16			; GFX6-LABEL: name: function_store_private_v2s16
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	; GFX9-LABEL: name: store_private_v2s16			; GFX9-LABEL: name: function_store_private_v2s16
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	%0:vgpr(<2 x s16>) = COPY $vgpr0			%0:vgpr(<2 x s16>) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 5)			G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

	...			...

	---			---

	name: store_private_p3			name: function_store_private_p3
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_p3			; GFX6-LABEL: name: function_store_private_p3
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	; GFX9-LABEL: name: store_private_p3			; GFX9-LABEL: name: function_store_private_p3
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 5)			G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

	...			...

	---			---

	name: store_private_p5			name: function_store_private_p5
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_p5			; GFX6-LABEL: name: function_store_private_p5
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	; GFX9-LABEL: name: store_private_p5			; GFX9-LABEL: name: function_store_private_p5
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 5)			G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_1_fi_offset_4095			name: function_store_private_s32_to_1_fi_offset_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_private_s32_to_1_fi_offset_4095			; GFX6-LABEL: name: function_store_private_s32_to_1_fi_offset_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX6: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_2]], %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_2]], %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_1_fi_offset_4095			; GFX9-LABEL: name: function_store_private_s32_to_1_fi_offset_4095
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(p5) = G_FRAME_INDEX %stack.0			%0:vgpr(p5) = G_FRAME_INDEX %stack.0
	%1:vgpr(s32) = G_CONSTANT i32 4095			%1:vgpr(s32) = G_CONSTANT i32 4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_CONSTANT i32 0			%3:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %3, %2 :: (store 1, align 1, addrspace 5)			G_STORE %3, %2 :: (store 1, align 1, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_1_constant_4095			name: function_store_private_s32_to_1_constant_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_private_s32_to_1_constant_4095			; GFX6-LABEL: name: function_store_private_s32_to_1_constant_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_1_constant_4095			; GFX9-LABEL: name: function_store_private_s32_to_1_constant_4095
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(p5) = G_CONSTANT i32 4095			%0:vgpr(p5) = G_CONSTANT i32 4095
	%1:vgpr(s32) = G_CONSTANT i32 0			%1:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %1, %0 :: (store 1, align 1, addrspace 5)			G_STORE %1, %0 :: (store 1, align 1, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_1_constant_4096			name: function_store_private_s32_to_1_constant_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_private_s32_to_1_constant_4096			; GFX6-LABEL: name: function_store_private_s32_to_1_constant_4096
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_1_constant_4096			; GFX9-LABEL: name: function_store_private_s32_to_1_constant_4096
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				%0:vgpr(p5) = G_CONSTANT i32 4096
				%1:vgpr(s32) = G_CONSTANT i32 0
				G_STORE %1, %0 :: (store 1, align 1, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_4
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_4
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_4
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				%0:vgpr(s32) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_2
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_2
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_2
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)
				%0:vgpr(s32) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 2, align 2, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_1
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_1
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_1
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				%0:vgpr(s32) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 1, align 1, addrspace 5)

				...

				---

				name: kernel_store_private_v2s16
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_v2s16
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_v2s16
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				%0:vgpr(<2 x s16>) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

				...

				---

				name: kernel_store_private_p3
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_p3
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_p3
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				%0:vgpr(p3) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

				...

				---

				name: kernel_store_private_p5
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_p5
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_p5
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				%0:vgpr(p5) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_1_fi_offset_4095
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
				stack:
				- { id: 0, size: 4096, alignment: 4 }

				body: \|
				bb.0:
				liveins: $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_1_fi_offset_4095
				; GFX6: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
				; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
				; GFX6: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_2]], %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_1_fi_offset_4095
				; GFX9: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				%0:vgpr(p5) = G_FRAME_INDEX %stack.0
				%1:vgpr(s32) = G_CONSTANT i32 4095
				%2:vgpr(p5) = G_PTR_ADD %0, %1
				%3:vgpr(s32) = G_CONSTANT i32 0
				G_STORE %3, %2 :: (store 1, align 1, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_1_constant_4095
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
				stack:
				- { id: 0, size: 4096, alignment: 4 }

				body: \|
				bb.0:
				liveins: $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_1_constant_4095
				; GFX6: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_1_constant_4095
				; GFX9: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX9: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				%0:vgpr(p5) = G_CONSTANT i32 4095
				%1:vgpr(s32) = G_CONSTANT i32 0
				G_STORE %1, %0 :: (store 1, align 1, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_1_constant_4096
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
				stack:
				- { id: 0, size: 4096, alignment: 4 }

				body: \|
				bb.0:
				liveins: $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_1_constant_4096
				; GFX6: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
				; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_1_constant_4096
				; GFX9: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
				; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(p5) = G_CONSTANT i32 4096			%0:vgpr(p5) = G_CONSTANT i32 4096
	%1:vgpr(s32) = G_CONSTANT i32 0			%1:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %1, %0 :: (store 1, align 1, addrspace 5)			G_STORE %1, %0 :: (store 1, align 1, addrspace 5)

	...			...

llvm/test/CodeGen/AMDGPU/addrspacecast.ll

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	; HSA: enable_sgpr_dispatch_ptr = 0			; HSA: enable_sgpr_dispatch_ptr = 0
	; HSA: enable_sgpr_queue_ptr = 0			; HSA: enable_sgpr_queue_ptr = 0

	; HSA: s_load_dwordx2 s{{\[}}[[PTR_LO:[0-9]+]]:[[PTR_HI:[0-9]+]]{{\]}}			; HSA: s_load_dwordx2 s{{\[}}[[PTR_LO:[0-9]+]]:[[PTR_HI:[0-9]+]]{{\]}}
	; HSA-DAG: v_cmp_ne_u64_e64 vcc, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0{{$}}			; HSA-DAG: v_cmp_ne_u64_e64 vcc, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0{{$}}
	; HSA-DAG: v_mov_b32_e32 v[[VPTR_LO:[0-9]+]], s[[PTR_LO]]			; HSA-DAG: v_mov_b32_e32 v[[VPTR_LO:[0-9]+]], s[[PTR_LO]]
	; HSA-DAG: v_cndmask_b32_e32 [[CASTPTR:v[0-9]+]], 0, v[[VPTR_LO]]			; HSA-DAG: v_cndmask_b32_e32 [[CASTPTR:v[0-9]+]], 0, v[[VPTR_LO]]
	; HSA-DAG: v_mov_b32_e32 v[[K:[0-9]+]], 0{{$}}			; HSA-DAG: v_mov_b32_e32 v[[K:[0-9]+]], 0{{$}}
	; HSA: buffer_store_dword v[[K]], [[CASTPTR]], s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}			; HSA: buffer_store_dword v[[K]], [[CASTPTR]], s{{\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}
	define amdgpu_kernel void @use_flat_to_private_addrspacecast(i32* %ptr) #0 {			define amdgpu_kernel void @use_flat_to_private_addrspacecast(i32* %ptr) #0 {
	%ftos = addrspacecast i32* %ptr to i32 addrspace(5)*			%ftos = addrspacecast i32* %ptr to i32 addrspace(5)*
	store volatile i32 0, i32 addrspace(5)* %ftos			store volatile i32 0, i32 addrspace(5)* %ftos
	ret void			ret void
	}			}

	; HSA-LABEL: {{^}}use_flat_to_global_addrspacecast:			; HSA-LABEL: {{^}}use_flat_to_global_addrspacecast:
	; HSA: enable_sgpr_queue_ptr = 0			; HSA: enable_sgpr_queue_ptr = 0
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @cast_0_private_to_flat_addrspacecast() #0 {			define amdgpu_kernel void @cast_0_private_to_flat_addrspacecast() #0 {
	%cast = addrspacecast i32 addrspace(5)* null to i32*			%cast = addrspacecast i32 addrspace(5)* null to i32*
	store volatile i32 7, i32* %cast			store volatile i32 7, i32* %cast
	ret void			ret void
	}			}

	; HSA-LABEL: {{^}}cast_0_flat_to_private_addrspacecast:			; HSA-LABEL: {{^}}cast_0_flat_to_private_addrspacecast:
	; HSA: v_mov_b32_e32 [[K:v[0-9]+]], 7{{$}}			; HSA: v_mov_b32_e32 [[K:v[0-9]+]], 7{{$}}
	; HSA: buffer_store_dword [[K]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+$}}			; HSA: buffer_store_dword [[K]], off, s{{\[[0-9]+:[0-9]+\]}}, 0
	define amdgpu_kernel void @cast_0_flat_to_private_addrspacecast() #0 {			define amdgpu_kernel void @cast_0_flat_to_private_addrspacecast() #0 {
	%cast = addrspacecast i32* null to i32 addrspace(5)*			%cast = addrspacecast i32* null to i32 addrspace(5)*
	store volatile i32 7, i32 addrspace(5)* %cast			store volatile i32 7, i32 addrspace(5)* %cast
	ret void			ret void
	}			}

	; Disable optimizations in case there are optimizations added that			; Disable optimizations in case there are optimizations added that
	; specialize away generic pointer accesses.			; specialize away generic pointer accesses.
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

Show All 39 Lines
; by 4 bytes.		; by 4 bytes.
; HSA-ALLOCA: workitem_private_segment_byte_size = 24		; HSA-ALLOCA: workitem_private_segment_byte_size = 24
; HSA-ALLOCA: .end_amd_kernel_code_t		; HSA-ALLOCA: .end_amd_kernel_code_t

; HSA-ALLOCA: s_mov_b32 flat_scratch_lo, s7		; HSA-ALLOCA: s_mov_b32 flat_scratch_lo, s7
; HSA-ALLOCA: s_add_u32 s6, s6, s9		; HSA-ALLOCA: s_add_u32 s6, s6, s9
; HSA-ALLOCA: s_lshr_b32 flat_scratch_hi, s6, 8		; HSA-ALLOCA: s_lshr_b32 flat_scratch_hi, s6, 8

; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen ; encoding: [0x00,0x10,0x70,0xe0		; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen ; encoding: [0x00,0x10,0x70,0xe0
; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen ; encoding: [0x00,0x10,0x70,0xe0		; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen ; encoding: [0x00,0x10,0x70,0xe0


; HSAOPT: [[DISPATCH_PTR:%[0-9]+]] = call noalias nonnull dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()		; HSAOPT: [[DISPATCH_PTR:%[0-9]+]] = call noalias nonnull dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()
; HSAOPT: [[CAST_DISPATCH_PTR:%[0-9]+]] = bitcast i8 addrspace(4)* [[DISPATCH_PTR]] to i32 addrspace(4)*		; HSAOPT: [[CAST_DISPATCH_PTR:%[0-9]+]] = bitcast i8 addrspace(4)* [[DISPATCH_PTR]] to i32 addrspace(4)*
; HSAOPT: [[GEP0:%[0-9]+]] = getelementptr inbounds i32, i32 addrspace(4)* [[CAST_DISPATCH_PTR]], i64 1		; HSAOPT: [[GEP0:%[0-9]+]] = getelementptr inbounds i32, i32 addrspace(4)* [[CAST_DISPATCH_PTR]], i64 1
; HSAOPT: [[LDXY:%[0-9]+]] = load i32, i32 addrspace(4)* [[GEP0]], align 4, !invariant.load !0		; HSAOPT: [[LDXY:%[0-9]+]] = load i32, i32 addrspace(4)* [[GEP0]], align 4, !invariant.load !0
; HSAOPT: [[GEP1:%[0-9]+]] = getelementptr inbounds i32, i32 addrspace(4)* [[CAST_DISPATCH_PTR]], i64 2		; HSAOPT: [[GEP1:%[0-9]+]] = getelementptr inbounds i32, i32 addrspace(4)* [[CAST_DISPATCH_PTR]], i64 2
; HSAOPT: [[LDZU:%[0-9]+]] = load i32, i32 addrspace(4)* [[GEP1]], align 4, !range !1, !invariant.load !0		; HSAOPT: [[LDZU:%[0-9]+]] = load i32, i32 addrspace(4)* [[GEP1]], align 4, !range !1, !invariant.load !0
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	for.end:
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}short_array:		; FUNC-LABEL: {{^}}short_array:

; R600-VECT: MOVA_INT		; R600-VECT: MOVA_INT

; SI-ALLOCA-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:6 ; encoding: [0x06,0x00,0x68,0xe0		; SI-ALLOCA-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:6 ; encoding: [0x06,0x00,0x68,0xe0
; SI-ALLOCA-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ; encoding: [0x04,0x00,0x68,0xe0		; SI-ALLOCA-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ; encoding: [0x04,0x00,0x68,0xe0
; Loaded value is 0 or 1, so sext will become zext, so we get buffer_load_ushort instead of buffer_load_sshort.		; Loaded value is 0 or 1, so sext will become zext, so we get buffer_load_ushort instead of buffer_load_sshort.
; SI-ALLOCA: buffer_load_sshort v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}}		; SI-ALLOCA: buffer_load_sshort v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0

; SI-PROMOTE-VECT: s_load_dword [[IDX:s[0-9]+]]		; SI-PROMOTE-VECT: s_load_dword [[IDX:s[0-9]+]]
; SI-PROMOTE-VECT: s_mov_b32 [[SREG:s[0-9]+]], 0x10000		; SI-PROMOTE-VECT: s_mov_b32 [[SREG:s[0-9]+]], 0x10000
; SI-PROMOTE-VECT: s_lshl_b32 [[SCALED_IDX:s[0-9]+]], [[IDX]], 4		; SI-PROMOTE-VECT: s_lshl_b32 [[SCALED_IDX:s[0-9]+]], [[IDX]], 4
; SI-PROMOTE-VECT: v_mov_b32_e32 [[VREG:v[0-9]+]], [[SCALED_IDX]]		; SI-PROMOTE-VECT: v_mov_b32_e32 [[VREG:v[0-9]+]], [[SCALED_IDX]]
; SI-PROMOTE-VECT: v_bfe_u32 v{{[0-9]+}}, [[SREG]], [[VREG]], 16		; SI-PROMOTE-VECT: v_bfe_u32 v{{[0-9]+}}, [[SREG]], [[VREG]], 16
define amdgpu_kernel void @short_array(i32 addrspace(1)* %out, i32 %index) #0 {		define amdgpu_kernel void @short_array(i32 addrspace(1)* %out, i32 %index) #0 {
entry:		entry:
Show All 11 Lines

; FUNC-LABEL: {{^}}char_array:		; FUNC-LABEL: {{^}}char_array:

; R600-VECT: MOVA_INT		; R600-VECT: MOVA_INT

; SI-PROMOTE-VECT-DAG: s_lshl_b32		; SI-PROMOTE-VECT-DAG: s_lshl_b32
; SI-PROMOTE-VECT-DAG: v_lshrrev		; SI-PROMOTE-VECT-DAG: v_lshrrev

; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ; encoding: [0x04,0x00,0x60,0xe0		; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ; encoding: [0x04,0x00,0x60,0xe0
; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:5 ; encoding: [0x05,0x00,0x60,0xe0		; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:5 ; encoding: [0x05,0x00,0x60,0xe0
define amdgpu_kernel void @char_array(i32 addrspace(1)* %out, i32 %index) #0 {		define amdgpu_kernel void @char_array(i32 addrspace(1)* %out, i32 %index) #0 {
entry:		entry:
%0 = alloca [2 x i8], addrspace(5)		%0 = alloca [2 x i8], addrspace(5)
%1 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 0		%1 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 0
%2 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 1		%2 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 1
store i8 0, i8 addrspace(5)* %1		store i8 0, i8 addrspace(5)* %1
store i8 1, i8 addrspace(5)* %2		store i8 1, i8 addrspace(5)* %2
%3 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 %index		%3 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 %index
%4 = load i8, i8 addrspace(5)* %3		%4 = load i8, i8 addrspace(5)* %3
%5 = sext i8 %4 to i32		%5 = sext i8 %4 to i32
store i32 %5, i32 addrspace(1)* %out		store i32 %5, i32 addrspace(1)* %out
ret void		ret void
}		}

; Test that two stack objects are not stored in the same register		; Test that two stack objects are not stored in the same register
; The second stack object should be in T3.X		; The second stack object should be in T3.X
; FUNC-LABEL: {{^}}no_overlap:		; FUNC-LABEL: {{^}}no_overlap:
; R600-CHECK: MOV		; R600-CHECK: MOV
; R600-CHECK: [[CHAN:[XYZW]]]+		; R600-CHECK: [[CHAN:[XYZW]]]+
; R600-NOT: [[CHAN]]+		; R600-NOT: [[CHAN]]+
;		;
; A total of 5 bytes should be allocated and used.		; A total of 5 bytes should be allocated and used.
; SI: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ;		; SI: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ;
define amdgpu_kernel void @no_overlap(i32 addrspace(1)* %out, i32 %in) #0 {		define amdgpu_kernel void @no_overlap(i32 addrspace(1)* %out, i32 %in) #0 {
entry:		entry:
%0 = alloca [3 x i8], align 1, addrspace(5)		%0 = alloca [3 x i8], align 1, addrspace(5)
%1 = alloca [2 x i8], align 1, addrspace(5)		%1 = alloca [2 x i8], align 1, addrspace(5)
%2 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 0		%2 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 0
%3 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 1		%3 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 1
%4 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 2		%4 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 2
%5 = getelementptr [2 x i8], [2 x i8] addrspace(5)* %1, i32 0, i32 0		%5 = getelementptr [2 x i8], [2 x i8] addrspace(5)* %1, i32 0, i32 0
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; AMDGPUPromoteAlloca does not know how to handle ptrtoint. When it		; AMDGPUPromoteAlloca does not know how to handle ptrtoint. When it
; finds one, it should stop trying to promote.		; finds one, it should stop trying to promote.

; FUNC-LABEL: ptrtoint:		; FUNC-LABEL: ptrtoint:
; SI-NOT: ds_write		; SI-NOT: ds_write
; SI: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen		; SI: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen
; SI: v_add_{{[iu]}}32_e32 [[ADD_OFFSET:v[0-9]+]], vcc, 5,		; SI: v_add_{{[iu]}}32_e32 [[ADD_OFFSET:v[0-9]+]], vcc, 5,
; SI: buffer_load_dword v{{[0-9]+}}, [[ADD_OFFSET:v[0-9]+]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen ;		; SI: buffer_load_dword v{{[0-9]+}}, [[ADD_OFFSET:v[0-9]+]], s[{{[0-9]+:[0-9]+}}], 0 offen ;
define amdgpu_kernel void @ptrtoint(i32 addrspace(1)* %out, i32 %a, i32 %b) #0 {		define amdgpu_kernel void @ptrtoint(i32 addrspace(1)* %out, i32 %a, i32 %b) #0 {
%alloca = alloca [16 x i32], addrspace(5)		%alloca = alloca [16 x i32], addrspace(5)
%tmp0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %a		%tmp0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %a
store i32 5, i32 addrspace(5)* %tmp0		store i32 5, i32 addrspace(5)* %tmp0
%tmp1 = ptrtoint [16 x i32] addrspace(5)* %alloca to i32		%tmp1 = ptrtoint [16 x i32] addrspace(5)* %alloca to i32
%tmp2 = add i32 %tmp1, 5		%tmp2 = add i32 %tmp1, 5
%tmp3 = inttoptr i32 %tmp2 to i32 addrspace(5)*		%tmp3 = inttoptr i32 %tmp2 to i32 addrspace(5)*
%tmp4 = getelementptr i32, i32 addrspace(5)* %tmp3, i32 %b		%tmp4 = getelementptr i32, i32 addrspace(5)* %tmp3, i32 %b
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/array-ptr-calc-i32.ll

	; RUN: llc -verify-machineinstrs -march=amdgcn -mcpu=tahiti -mattr=-promote-alloca < %s \| FileCheck -check-prefix=SI-ALLOCA -check-prefix=SI %s			; RUN: llc -verify-machineinstrs -march=amdgcn -mcpu=tahiti -mattr=-promote-alloca < %s \| FileCheck -check-prefix=SI-ALLOCA -check-prefix=SI %s
	; RUN: llc -verify-machineinstrs -march=amdgcn -mcpu=tahiti -mattr=+promote-alloca < %s \| FileCheck -check-prefix=SI-PROMOTE -check-prefix=SI %s			; RUN: llc -verify-machineinstrs -march=amdgcn -mcpu=tahiti -mattr=+promote-alloca < %s \| FileCheck -check-prefix=SI-PROMOTE -check-prefix=SI %s

	declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #1			declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #1
	declare i32 @llvm.amdgcn.mbcnt.hi(i32, i32) #1			declare i32 @llvm.amdgcn.mbcnt.hi(i32, i32) #1
	declare void @llvm.amdgcn.s.barrier() #2			declare void @llvm.amdgcn.s.barrier() #2

	; The required pointer calculations for the alloca'd actually requires			; The required pointer calculations for the alloca'd actually requires
	; an add and won't be folded into the addressing, which fails with a			; an add and won't be folded into the addressing, which fails with a
	; 64-bit pointer add. This should work since private pointers should			; 64-bit pointer add. This should work since private pointers should
	; be 32-bits.			; be 32-bits.

	; SI-LABEL: {{^}}test_private_array_ptr_calc:			; SI-LABEL: {{^}}test_private_array_ptr_calc:

	; SI-ALLOCA: v_add_i32_e32 [[PTRREG:v[0-9]+]], vcc, 16, v{{[0-9]+}}			; SI-ALLOCA: v_add_i32_e32 [[PTRREG:v[0-9]+]], vcc, 16, v{{[0-9]+}}
	; SI-ALLOCA: buffer_store_dword {{v[0-9]+}}, [[PTRREG]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen offset:64			; SI-ALLOCA: buffer_store_dword {{v[0-9]+}}, [[PTRREG]], s[{{[0-9]+:[0-9]+}}], 0 offen offset:64
	; SI-ALLOCA: s_barrier			; SI-ALLOCA: s_barrier
	; SI-ALLOCA: buffer_load_dword {{v[0-9]+}}, [[PTRREG]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen offset:64			; SI-ALLOCA: buffer_load_dword {{v[0-9]+}}, [[PTRREG]], s[{{[0-9]+:[0-9]+}}], 0 offen offset:64
	;			;
	; FIXME: The AMDGPUPromoteAlloca pass should be able to convert this			; FIXME: The AMDGPUPromoteAlloca pass should be able to convert this
	; alloca to a vector. It currently fails because it does not know how			; alloca to a vector. It currently fails because it does not know how
	; to interpret:			; to interpret:
	; getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 1, i32 %b			; getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 1, i32 %b

	; SI-PROMOTE: v_add_i32_e32 [[PTRREG:v[0-9]+]], vcc, 64			; SI-PROMOTE: v_add_i32_e32 [[PTRREG:v[0-9]+]], vcc, 64
	; SI-PROMOTE: ds_write_b32 [[PTRREG]]			; SI-PROMOTE: ds_write_b32 [[PTRREG]]
	Show All 24 Lines

llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=TOSGPR -check-prefix=ALL %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=TOSGPR -check-prefix=ALL %s

	; FIXME: Vectorization can increase required SGPR count beyond limit.			; FIXME: Vectorization can increase required SGPR count beyond limit.

				; FIXME: I'm not sure I understand what this is testing, but after the
				; CC change the spare SGPR available once the scratch wave offset dies
				; seems to help GreedyRA avoid a spill, but somehow ends up with another
				; SGPR used.
				scott.linderAuthorUnsubmitted Done Reply Inline Actions Can anyone help me understand what we are trying to test here? It seems likely the amount of live SGPRs and the amount of available SGPRs needs to be adjusted to have this test continue to be meaningful, but in trying to correct it I realized I wasn't sure what it was testing in the first place. scott.linder: Can anyone help me understand what we are trying to test here? It seems likely the amount of…

	; ALL-LABEL: {{^}}max_9_sgprs:			; ALL-LABEL: {{^}}max_9_sgprs:

	; ALL: SGPRBlocks: 1			; ALL: SGPRBlocks: 1
	; ALL: NumSGPRsForWavesPerEU: 9			; ALL: NumSGPRsForWavesPerEU: 9
	define amdgpu_kernel void @max_9_sgprs() #0 {			define amdgpu_kernel void @max_9_sgprs() #0 {
	%one = load volatile i32, i32 addrspace(4)* undef			%one = load volatile i32, i32 addrspace(4)* undef
	%two = load volatile i32, i32 addrspace(4)* undef			%two = load volatile i32, i32 addrspace(4)* undef
	%three = load volatile i32, i32 addrspace(4)* undef			%three = load volatile i32, i32 addrspace(4)* undef
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s

%struct.ByValStruct = type { [4 x i32] }		%struct.ByValStruct = type { [4 x i32] }

; GCN-LABEL: {{^}}void_func_byval_struct:
; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s32{{$}}
; GCN-NOT: s32
; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-NOT: s32

; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s32 offset:16{{$}}
; GCN-NOT: s32
; GCN: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:16{{$}}
; GCN-NOT: s32
define hidden void @void_func_byval_struct(%struct.ByValStruct addrspace(5)* byval noalias nocapture align 4 %arg0, %struct.ByValStruct addrspace(5)* byval noalias nocapture align 4 %arg1) #1 {
entry:
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
%tmp = load volatile i32, i32 addrspace(5)* %arrayidx, align 4
%add = add nsw i32 %tmp, 1
store volatile i32 %add, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
%tmp1 = load volatile i32, i32 addrspace(5)* %arrayidx2, align 4
%add3 = add nsw i32 %tmp1, 2
store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 4
store volatile i32 9, i32 addrspace(1)* null, align 4
ret void
}

; Make sure the offset is folded and function's frame register is used		; Make sure the offset is folded and function's frame register is used
; rather than the global scratch wave offset.		; rather than the global scratch wave offset.
; GCN-LABEL: {{^}}void_func_byval_struct_use_outside_entry_block:		; GCN-LABEL: {{^}}void_func_byval_struct_use_outside_entry_block:
; GCN-NOT: v_lshrrev_b32		; GCN-NOT: v_lshrrev_b32
; GCN-NOT: s_sub_u32		; GCN-NOT: s_sub_u32

; GCN: s_and_saveexec_b64		; GCN: s_and_saveexec_b64
; GCN: s_cbranch_execz [[BB1:BB[0-9]+_[0-9]+]]		; GCN: s_cbranch_execz [[BB1:BB[0-9]+_[0-9]+]]
Show All 24 Lines	bb0:
%add3 = add nsw i32 %tmp1, 2		%add3 = add nsw i32 %tmp1, 2
store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 4		store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 4
store volatile i32 9, i32 addrspace(1)* null, align 4		store volatile i32 9, i32 addrspace(1)* null, align 4
br label %bb1		br label %bb1

bb1:		bb1:
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_byval_struct_non_leaf:
; GCN: buffer_store_dword v33, off, s[0:3], s32 offset:36
; GCN-DAG: v_writelane_b32 v33, s34,
; GCN: s_mov_b32 s34, s32
; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s34{{$}}
; GCN-DAG: s_add_u32 s32, s32, 0xc00{{$}}
; GCN-DAG: buffer_store_dword v32, off, s[0:3], s34 offset:32
; GCN-NOT: v_writelane_b32 v{{[0-9]+}}, s32

; GCN-DAG: v_add_{{[iu]}}32_e32 [[ADD0:v[0-9]+]], vcc, 1, [[LOAD0]]
; GCN: buffer_store_dword [[ADD0]], off, s[0:3], s34{{$}}

; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s34 offset:16{{$}}
; GCN-DAG: v_add_{{[iu]}}32_e32 [[ADD1:v[0-9]+]], vcc, 2, [[LOAD1]]

; GCN: s_swappc_b64

; GCN: buffer_store_dword [[ADD1]], off, s[0:3], s34 offset:16{{$}}

; GCN: v_readlane_b32
; GCN-NOT: v_readlane_b32 s32
; GCN-DAG: buffer_load_dword v32, off, s[0:3], s34 offset:32
; GCN: s_sub_u32 s32, s32, 0xc00{{$}}
; GCN: v_readlane_b32 s34, v33,
; GCN-DAG: buffer_load_dword v33, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
; GCN: s_setpc_b64
define void @void_func_byval_struct_non_leaf(%struct.ByValStruct addrspace(5)* byval noalias nocapture align 4 %arg0, %struct.ByValStruct addrspace(5)* byval noalias nocapture align 4 %arg1) #1 {
entry:
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
%tmp = load volatile i32, i32 addrspace(5)* %arrayidx, align 4
%add = add nsw i32 %tmp, 1
store volatile i32 %add, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
%tmp1 = load volatile i32, i32 addrspace(5)* %arrayidx2, align 4
%add3 = add nsw i32 %tmp1, 2
call void @external_void_func_void()
store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 4
store volatile i32 9, i32 addrspace(1)* null, align 4
ret void
}

; GCN-LABEL: {{^}}call_void_func_byval_struct_func:
; GCN: s_mov_b32 s34, s32
; GCN-DAG: s_add_u32 s32, s32, 0xc00{{$}}
; GCN-DAG: v_writelane_b32

; GCN-DAG: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN-DAG: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13

; GCN-DAG: buffer_store_dword [[NINE]], off, s[0:3], s34{{$}}
; GCN-DAG: buffer_store_dword [[THIRTEEN]], off, s[0:3], s34 offset:16

; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s34{{$}}
; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s34 offset:4
; GCN-DAG: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s34 offset:8
; GCN-DAG: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s34 offset:12

; GCN-NOT: s_add_u32 s32, s32, 0x800


; GCN-DAG: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-DAG: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
; GCN-DAG: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
; GCN-DAG: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12

; GCN: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s34 offset:16
; GCN: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s34 offset:20
; GCN: buffer_load_dword [[LOAD6:v[0-9]+]], off, s[0:3], s34 offset:24
; GCN: buffer_load_dword [[LOAD7:v[0-9]+]], off, s[0:3], s34 offset:28

; GCN-DAG: buffer_store_dword [[LOAD4]], off, s[0:3], s32 offset:16
; GCN-DAG: buffer_store_dword [[LOAD5]], off, s[0:3], s32 offset:20
; GCN-DAG: buffer_store_dword [[LOAD6]], off, s[0:3], s32 offset:24
; GCN-DAG: buffer_store_dword [[LOAD7]], off, s[0:3], s32 offset:28

; GCN: s_swappc_b64
; GCN-NOT: v_readlane_b32 s32
; GCN: v_readlane_b32
; GCN-NOT: v_readlane_b32 s32

; GCN-NOT: s_sub_u32 s32, s32, 0x800

; GCN: s_sub_u32 s32, s32, 0xc00{{$}}
; GCN: v_readlane_b32 s34, v
; GCN: s_waitcnt
; GCN: s_setpc_b64
define void @call_void_func_byval_struct_func() #1 {
entry:
%arg0 = alloca %struct.ByValStruct, align 4, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 4, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 4
call void @void_func_byval_struct(%struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

; GCN-LABEL: {{^}}call_void_func_byval_struct_kernel:
; GCN: s_mov_b32 s33, s7
; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN: buffer_store_dword [[NINE]], off, s[0:3], s33 offset:8
; GCN: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13
; GCN: buffer_store_dword [[THIRTEEN]], off, s[0:3], s33 offset:24

; GCN-NOT: s_add_u32 s32, s32, 0x800
; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s33 offset:8
; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s33 offset:12
; GCN-DAG: s_add_u32 s32, s33, 0xc00{{$}}
; GCN-DAG: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s33 offset:16
; GCN-DAG: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s33 offset:20

; GCN: s_getpc_b64

; GCN-DAG: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-DAG: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
; GCN-DAG: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
; GCN-DAG: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12

; GCN-DAG: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s33 offset:24
; GCN-DAG: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s33 offset:28
; GCN-DAG: buffer_load_dword [[LOAD6:v[0-9]+]], off, s[0:3], s33 offset:32
; GCN-DAG: buffer_load_dword [[LOAD7:v[0-9]+]], off, s[0:3], s33 offset:36

; GCN-DAG: buffer_store_dword [[LOAD4]], off, s[0:3], s32 offset:16
; GCN-DAG: buffer_store_dword [[LOAD5]], off, s[0:3], s32 offset:20
; GCN-DAG: buffer_store_dword [[LOAD6]], off, s[0:3], s32 offset:24
; GCN-DAG: buffer_store_dword [[LOAD7]], off, s[0:3], s32 offset:28


; GCN: s_swappc_b64
; GCN-NOT: s_sub_u32 s32
; GCN: s_endpgm
define amdgpu_kernel void @call_void_func_byval_struct_kernel() #1 {
entry:
%arg0 = alloca %struct.ByValStruct, align 4, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 4, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 4
call void @void_func_byval_struct(%struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

; GCN-LABEL: {{^}}void_func_byval_struct_align8:
; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s32{{$}}
; GCN-NOT: s32
; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-NOT: s32

; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s32 offset:16{{$}}
; GCN-NOT: s32
; GCN: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:16{{$}}
; GCN-NOT: s32
define hidden void @void_func_byval_struct_align8(%struct.ByValStruct addrspace(5)* byval noalias nocapture align 8 %arg0, %struct.ByValStruct addrspace(5)* byval noalias nocapture align 8 %arg1) #1 {
entry:
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
%tmp = load volatile i32, i32 addrspace(5)* %arrayidx, align 8
%add = add nsw i32 %tmp, 1
store volatile i32 %add, i32 addrspace(5)* %arrayidx, align 8
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
%tmp1 = load volatile i32, i32 addrspace(5)* %arrayidx2, align 8
%add3 = add nsw i32 %tmp1, 2
store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 8
store volatile i32 9, i32 addrspace(1)* null, align 4
ret void
}

; Make sure the byval alignment is respected in the call frame setup
; GCN-LABEL: {{^}}call_void_func_byval_struct_align8_kernel:
; GCN: s_mov_b32 s33, s7
; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN: buffer_store_dword [[NINE]], off, s[0:3], s33 offset:8
; GCN: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13
; GCN: buffer_store_dword [[THIRTEEN]], off, s[0:3], s33 offset:24


; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s33 offset:8
; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s33 offset:12
; GCN: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s33 offset:16
; GCN: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s33 offset:20

; GCN-NOT: s_add_u32 s32, s32, 0x800
; GCN-DAG: s_add_u32 s32, s33, 0xc00{{$}}

; GCN: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12
; GCN: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
; GCN: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}


; GCN-DAG: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s33 offset:24
; GCN-DAG: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s33 offset:28
; GCN-DAG: buffer_load_dword [[LOAD6:v[0-9]+]], off, s[0:3], s33 offset:32
; GCN-DAG: buffer_load_dword [[LOAD7:v[0-9]+]], off, s[0:3], s33 offset:36

; GCN-DAG: buffer_store_dword [[LOAD4]], off, s[0:3], s32 offset:16
; GCN-DAG: buffer_store_dword [[LOAD5]], off, s[0:3], s32 offset:20
; GCN-DAG: buffer_store_dword [[LOAD6]], off, s[0:3], s32 offset:24
; GCN-DAG: buffer_store_dword [[LOAD7]], off, s[0:3], s32 offset:28


; GCN: s_swappc_b64
; GCN-NOT: s_sub_u32 s32
; GCN: s_endpgm
define amdgpu_kernel void @call_void_func_byval_struct_align8_kernel() #1 {
entry:
%arg0 = alloca %struct.ByValStruct, align 8, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 8, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 8
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 8
call void @void_func_byval_struct_align8(%struct.ByValStruct addrspace(5)* byval nonnull align 8 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 8 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

; GCN-LABEL: {{^}}call_void_func_byval_struct_align8_func:
; GCN: s_mov_b32 s34, s32
; GCN-DAG: s_add_u32 s32, s32, 0xc00{{$}}
; GCN-DAG: v_writelane_b32

; GCN-DAG: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN-DAG: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13

; GCN-DAG: buffer_store_dword [[NINE]], off, s[0:3], s34{{$}}
; GCN-DAG: buffer_store_dword [[THIRTEEN]], off, s[0:3], s34 offset:16

; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s34{{$}}
; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s34 offset:4
; GCN-DAG: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s34 offset:8
; GCN-DAG: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s34 offset:12

; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN-DAG: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-DAG: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
; GCN-DAG: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
; GCN-DAG: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12

; GCN: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s34 offset:16
; GCN: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s34 offset:20
; GCN: buffer_load_dword [[LOAD6:v[0-9]+]], off, s[0:3], s34 offset:24
; GCN: buffer_load_dword [[LOAD7:v[0-9]+]], off, s[0:3], s34 offset:28

; GCN: s_waitcnt vmcnt(0)
; GCN-DAG: buffer_store_dword [[LOAD4]], off, s[0:3], s32 offset:16
; GCN-DAG: buffer_store_dword [[LOAD5]], off, s[0:3], s32 offset:20
; GCN-DAG: buffer_store_dword [[LOAD6]], off, s[0:3], s32 offset:24
; GCN-DAG: buffer_store_dword [[LOAD7]], off, s[0:3], s32 offset:28

; GCN: s_swappc_b64
; GCN-NOT: v_readlane_b32 s32
; GCN: v_readlane_b32
; GCN-NOT: v_readlane_b32 s32

; GCN-NOT: s_sub_u32 s32, s32, 0x800

; GCN: s_sub_u32 s32, s32, 0xc00{{$}}
; GCN: v_readlane_b32 s34, v
; GCN: s_waitcnt
; GCN-NEXT: s_setpc_b64
define void @call_void_func_byval_struct_align8_func() #0 {
entry:
%arg0 = alloca %struct.ByValStruct, align 8, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 8, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 8
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 8
call void @void_func_byval_struct_align8(%struct.ByValStruct addrspace(5)* byval nonnull align 8 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 8 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

; GCN-LABEL: {{^}}call_void_func_byval_struct_kernel_no_frame_pointer_elim:
define amdgpu_kernel void @call_void_func_byval_struct_kernel_no_frame_pointer_elim() #2 {
entry:
%arg0 = alloca %struct.ByValStruct, align 4, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 4, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 4
call void @void_func_byval_struct(%struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

declare hidden void @external_void_func_void() #0		declare hidden void @external_void_func_void() #0

declare void @llvm.lifetime.start.p5i8(i64, i8 addrspace(5)* nocapture) #3		declare void @llvm.lifetime.start.p5i8(i64, i8 addrspace(5)* nocapture) #3
declare void @llvm.lifetime.end.p5i8(i64, i8 addrspace(5)* nocapture) #3		declare void @llvm.lifetime.end.p5i8(i64, i8 addrspace(5)* nocapture) #3

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { noinline norecurse nounwind }		attributes #1 = { noinline norecurse nounwind }
attributes #2 = { nounwind norecurse "frame-pointer"="all" }		attributes #2 = { nounwind norecurse "frame-pointer"="all" }

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i1_imm() #0 {		define amdgpu_kernel void @test_call_external_void_func_i1_imm() #0 {
call void @external_void_func_i1(i1 true)		call void @external_void_func_i1(i1 true)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i1_signext:		; GCN-LABEL: {{^}}test_call_external_void_func_i1_signext:
; MESA: s_mov_b32 s33, s3{{$}}
; HSA: s_mov_b32 s33, s9{{$}}

; HSA: buffer_load_ubyte [[VAR:v[0-9]+]]		; HSA: buffer_load_ubyte [[VAR:v[0-9]+]]
; HSA: s_mov_b32 s32, s33		; HSA: s_mov_b32 s32, 0
; MESA-DAG: buffer_load_ubyte [[VAR:v[0-9]+]]		; MESA-DAG: buffer_load_ubyte [[VAR:v[0-9]+]]
; MESA-DAG: s_mov_b32 s32, s33{{$}}		; MESA-DAG: s_mov_b32 s32, 0{{$}}


; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i1_signext@rel32@lo+4		; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i1_signext@rel32@lo+4
; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i1_signext@rel32@hi+4		; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i1_signext@rel32@hi+4

; GCN: s_waitcnt vmcnt(0)		; GCN: s_waitcnt vmcnt(0)
; GCN-NEXT: v_bfe_i32 v0, v0, 0, 1		; GCN-NEXT: v_bfe_i32 v0, v0, 0, 1
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i1_signext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i1_signext(i32) #0 {
%var = load volatile i1, i1 addrspace(1)* undef		%var = load volatile i1, i1 addrspace(1)* undef
call void @external_void_func_i1_signext(i1 %var)		call void @external_void_func_i1_signext(i1 %var)
ret void		ret void
}		}

; FIXME: load should be scheduled before getpc		; FIXME: load should be scheduled before getpc
; GCN-LABEL: {{^}}test_call_external_void_func_i1_zeroext:		; GCN-LABEL: {{^}}test_call_external_void_func_i1_zeroext:
; MESA: s_mov_b32 s33, s3{{$}}

; HSA: buffer_load_ubyte v0		; HSA: buffer_load_ubyte v0
; HSA-DAG: s_mov_b32 s32, s33{{$}}		; HSA-DAG: s_mov_b32 s32, 0{{$}}

; MESA: buffer_load_ubyte v0		; MESA: buffer_load_ubyte v0
; MESA-DAG: s_mov_b32 s32, s33{{$}}		; MESA-DAG: s_mov_b32 s32, 0{{$}}

; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i1_zeroext@rel32@lo+4		; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i1_zeroext@rel32@lo+4
; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i1_zeroext@rel32@hi+4		; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i1_zeroext@rel32@hi+4


; GCN: s_waitcnt vmcnt(0)		; GCN: s_waitcnt vmcnt(0)
; GCN-NEXT: v_and_b32_e32 v0, 1, v0		; GCN-NEXT: v_and_b32_e32 v0, 1, v0
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i1_zeroext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i1_zeroext(i32) #0 {
%var = load volatile i1, i1 addrspace(1)* undef		%var = load volatile i1, i1 addrspace(1)* undef
call void @external_void_func_i1_zeroext(i1 %var)		call void @external_void_func_i1_zeroext(i1 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i8_imm:		; GCN-LABEL: {{^}}test_call_external_void_func_i8_imm:
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8@rel32@hi+4
; GCN-DAG: v_mov_b32_e32 v0, 0x7b		; GCN-DAG: v_mov_b32_e32 v0, 0x7b

; GCN-DAG: s_mov_b32 s32, s33{{$}}		; GCN-DAG: s_mov_b32 s32, 0{{$}}

; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i8_imm(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i8_imm(i32) #0 {
call void @external_void_func_i8(i8 123)		call void @external_void_func_i8(i8 123)
ret void		ret void
}		}

; FIXME: don't wait before call		; FIXME: don't wait before call
; GCN-LABEL: {{^}}test_call_external_void_func_i8_signext:		; GCN-LABEL: {{^}}test_call_external_void_func_i8_signext:
; HSA-DAG: s_mov_b32 s33, s9{{$}}
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-DAG: buffer_load_sbyte v0		; GCN-DAG: buffer_load_sbyte v0
; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8_signext@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8_signext@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8_signext@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8_signext@rel32@hi+4

; GCN-DAG: s_mov_b32 s32, s3		; GCN-DAG: s_mov_b32 s32, 0

; GCN-NOT: s_waitcnt		; GCN-NOT: s_waitcnt
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i8_signext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i8_signext(i32) #0 {
%var = load volatile i8, i8 addrspace(1)* undef		%var = load volatile i8, i8 addrspace(1)* undef
call void @external_void_func_i8_signext(i8 %var)		call void @external_void_func_i8_signext(i8 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i8_zeroext:		; GCN-LABEL: {{^}}test_call_external_void_func_i8_zeroext:
; MESA-DAG: s_mov_b32 s33, s3{{$}}
; HSA-DAG: s_mov_b32 s33, s9{{$}}

; GCN-DAG: buffer_load_ubyte v0		; GCN-DAG: buffer_load_ubyte v0
; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8_zeroext@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8_zeroext@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8_zeroext@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8_zeroext@rel32@hi+4

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN-NOT: s_waitcnt		; GCN-NOT: s_waitcnt
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i8_zeroext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i8_zeroext(i32) #0 {
%var = load volatile i8, i8 addrspace(1)* undef		%var = load volatile i8, i8 addrspace(1)* undef
call void @external_void_func_i8_zeroext(i8 %var)		call void @external_void_func_i8_zeroext(i8 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i16_imm:		; GCN-LABEL: {{^}}test_call_external_void_func_i16_imm:
; GCN-DAG: v_mov_b32_e32 v0, 0x7b{{$}}		; GCN-DAG: v_mov_b32_e32 v0, 0x7b{{$}}

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @test_call_external_void_func_i16_imm() #0 {		define amdgpu_kernel void @test_call_external_void_func_i16_imm() #0 {
call void @external_void_func_i16(i16 123)		call void @external_void_func_i16(i16 123)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i16_signext:		; GCN-LABEL: {{^}}test_call_external_void_func_i16_signext:
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-DAG: buffer_load_sshort v0		; GCN-DAG: buffer_load_sshort v0
; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i16_signext@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i16_signext@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i16_signext@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i16_signext@rel32@hi+4

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN-NOT: s_waitcnt		; GCN-NOT: s_waitcnt
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i16_signext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i16_signext(i32) #0 {
%var = load volatile i16, i16 addrspace(1)* undef		%var = load volatile i16, i16 addrspace(1)* undef
call void @external_void_func_i16_signext(i16 %var)		call void @external_void_func_i16_signext(i16 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i16_zeroext:		; GCN-LABEL: {{^}}test_call_external_void_func_i16_zeroext:
; MESA-DAG: s_mov_b32 s33, s3{{$}}


; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i16_zeroext@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i16_zeroext@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i16_zeroext@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i16_zeroext@rel32@hi+4

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN-NOT: s_waitcnt		; GCN-NOT: s_waitcnt
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i16_zeroext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i16_zeroext(i32) #0 {
%var = load volatile i16, i16 addrspace(1)* undef		%var = load volatile i16, i16 addrspace(1)* undef
call void @external_void_func_i16_zeroext(i16 %var)		call void @external_void_func_i16_zeroext(i16 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i32_imm:		; GCN-LABEL: {{^}}test_call_external_void_func_i32_imm:
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i32@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i32@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i32@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i32@rel32@hi+4
; GCN-DAG: v_mov_b32_e32 v0, 42		; GCN-DAG: v_mov_b32_e32 v0, 42
; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i32_imm(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i32_imm(i32) #0 {
call void @external_void_func_i32(i32 42)		call void @external_void_func_i32(i32 42)
ret void		ret void
}		}

▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
; GCN-DAG: v_mov_b32_e32 v0, 1		; GCN-DAG: v_mov_b32_e32 v0, 1
; GCN-DAG: v_mov_b32_e32 v1, 2		; GCN-DAG: v_mov_b32_e32 v1, 2
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @test_call_external_void_func_v2i32_imm() #0 {		define amdgpu_kernel void @test_call_external_void_func_v2i32_imm() #0 {
call void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)		call void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_v3i32_imm:		; GCN-LABEL: {{^}}test_call_external_void_func_v3i32_imm: {{.*}}
; HSA-DAG: s_mov_b32 s33, s9
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-NOT: v3		; GCN-NOT: v3
; GCN-DAG: v_mov_b32_e32 v0, 3		; GCN-DAG: v_mov_b32_e32 v0, 3
; GCN-DAG: v_mov_b32_e32 v1, 4		; GCN-DAG: v_mov_b32_e32 v1, 4
; GCN-DAG: v_mov_b32_e32 v2, 5		; GCN-DAG: v_mov_b32_e32 v2, 5

; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @test_call_external_void_func_v3i32_imm(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_v3i32_imm(i32) #0 {
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
define amdgpu_kernel void @test_call_external_void_func_v32i32() #0 {		define amdgpu_kernel void @test_call_external_void_func_v32i32() #0 {
%ptr = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef		%ptr = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef
%val = load <32 x i32>, <32 x i32> addrspace(1)* %ptr		%val = load <32 x i32>, <32 x i32> addrspace(1)* %ptr
call void @external_void_func_v32i32(<32 x i32> %val)		call void @external_void_func_v32i32(<32 x i32> %val)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_v32i32_i32:		; GCN-LABEL: {{^}}test_call_external_void_func_v32i32_i32:
; HSA-DAG: s_mov_b32 s33, s9
; HSA-NOT: s_add_u32 s32		; HSA-NOT: s_add_u32 s32

; MESA-DAG: s_mov_b32 s33, s3{{$}}
; MESA-NOT: s_add_u32 s32		; MESA-NOT: s_add_u32 s32

; GCN-DAG: buffer_load_dword [[VAL1:v[0-9]+]], off, s[{{[0-9]+}}:{{[0-9]+}}], 0{{$}}		; GCN-DAG: buffer_load_dword [[VAL1:v[0-9]+]], off, s[{{[0-9]+}}:{{[0-9]+}}], 0{{$}}
; GCN-DAG: buffer_load_dwordx4 v[0:3], off		; GCN-DAG: buffer_load_dwordx4 v[0:3], off
; GCN-DAG: buffer_load_dwordx4 v[4:7], off		; GCN-DAG: buffer_load_dwordx4 v[4:7], off
; GCN-DAG: buffer_load_dwordx4 v[8:11], off		; GCN-DAG: buffer_load_dwordx4 v[8:11], off
; GCN-DAG: buffer_load_dwordx4 v[12:15], off		; GCN-DAG: buffer_load_dwordx4 v[12:15], off
; GCN-DAG: buffer_load_dwordx4 v[16:19], off		; GCN-DAG: buffer_load_dwordx4 v[16:19], off
Show All 34 Lines	define amdgpu_kernel void @test_call_external_void_func_struct_i8_i32() #0 {
%val = load { i8, i32 }, { i8, i32 } addrspace(1)* %ptr0		%val = load { i8, i32 }, { i8, i32 } addrspace(1)* %ptr0
call void @external_void_func_struct_i8_i32({ i8, i32 } %val)		call void @external_void_func_struct_i8_i32({ i8, i32 } %val)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_byval_struct_i8_i32:		; GCN-LABEL: {{^}}test_call_external_void_func_byval_struct_i8_i32:
; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3		; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3
; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8		; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8
; MESA-DAG: buffer_store_byte [[VAL0]], off, s[36:39], s33 offset:8		; MESA-DAG: buffer_store_byte [[VAL0]], off, s[36:39], 0 offset:8
; MESA-DAG: buffer_store_dword [[VAL1]], off, s[36:39], s33 offset:12		; MESA-DAG: buffer_store_dword [[VAL1]], off, s[36:39], 0 offset:12

; HSA-DAG: buffer_store_byte [[VAL0]], off, s[0:3], s33 offset:8		; HSA-DAG: buffer_store_byte [[VAL0]], off, s[0:3], 0 offset:8
; HSA-DAG: buffer_store_dword [[VAL1]], off, s[0:3], s33 offset:12		; HSA-DAG: buffer_store_dword [[VAL1]], off, s[0:3], 0 offset:12

; HSA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[0:3], s33 offset:8		; HSA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[0:3], 0 offset:8
; HSA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[0:3], s33 offset:12		; HSA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[0:3], 0 offset:12

; MESA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[36:39], s33 offset:8		; MESA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[36:39], 0 offset:8
; MESA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[36:39], s33 offset:12		; MESA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[36:39], 0 offset:12

; GCN-DAG: s_add_u32 [[SP:s[0-9]+]], s33, 0x400{{$}}		; GCN-DAG: s_movk_i32 [[SP:s[0-9]+]], 0x400{{$}}

; HSA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[0:3], [[SP]]{{$}}		; HSA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[0:3], [[SP]]{{$}}
; HSA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[0:3], [[SP]] offset:4		; HSA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[0:3], [[SP]] offset:4

; MESA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[36:39], [[SP]]{{$}}		; MESA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[36:39], [[SP]]{{$}}
; MESA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[36:39], [[SP]] offset:4		; MESA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[36:39], [[SP]] offset:4

; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-NOT: [[SP]]		; GCN-NOT: [[SP]]
define amdgpu_kernel void @test_call_external_void_func_byval_struct_i8_i32() #0 {		define amdgpu_kernel void @test_call_external_void_func_byval_struct_i8_i32() #0 {
%val = alloca { i8, i32 }, align 4, addrspace(5)		%val = alloca { i8, i32 }, align 4, addrspace(5)
%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 0		%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 0
%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 1		%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 1
store i8 3, i8 addrspace(5)* %gep0		store i8 3, i8 addrspace(5)* %gep0
store i32 8, i32 addrspace(5)* %gep1		store i32 8, i32 addrspace(5)* %gep1
call void @external_void_func_byval_struct_i8_i32({ i8, i32 } addrspace(5)* %val)		call void @external_void_func_byval_struct_i8_i32({ i8, i32 } addrspace(5)* %val)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:		; GCN-LABEL: {{^}}test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
; MESA-DAG: s_add_u32 [[SP:s[0-9]+]], [[FP_REG:s[0-9]+]], 0x800{{$}}		; GCN-DAG: s_movk_i32 [[SP:s[0-9]+]], 0x800{{$}}
; HSA-DAG: s_add_u32 [[SP:s[0-9]+]], [[FP_REG:s[0-9]+]], 0x800{{$}}

; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3		; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3
; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8		; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8
; GCN-DAG: buffer_store_byte [[VAL0]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:8		; GCN-DAG: buffer_store_byte [[VAL0]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8
; GCN-DAG: buffer_store_dword [[VAL1]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:12		; GCN-DAG: buffer_store_dword [[VAL1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12

; GCN-DAG: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:8		; GCN-DAG: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8
; GCN-DAG: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:12		; GCN-DAG: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12

; GCN-NOT: s_add_u32 [[SP]]		; GCN-NOT: s_add_u32 [[SP]]
; GCN-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s{{\[[0-9]+:[0-9]+\]}}, [[SP]]{{$}}		; GCN-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s{{\[[0-9]+:[0-9]+\]}}, [[SP]]{{$}}
; GCN-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s{{\[[0-9]+:[0-9]+\]}}, [[SP]] offset:4		; GCN-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s{{\[[0-9]+:[0-9]+\]}}, [[SP]] offset:4
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-DAG: buffer_load_ubyte [[LOAD_OUT_VAL0:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:16		; GCN-DAG: buffer_load_ubyte [[LOAD_OUT_VAL0:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:16
; GCN-DAG: buffer_load_dword [[LOAD_OUT_VAL1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:20		; GCN-DAG: buffer_load_dword [[LOAD_OUT_VAL1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:20
; GCN-NOT: s_sub_u32 [[SP]]		; GCN-NOT: s_sub_u32 [[SP]]

; GCN: buffer_store_byte [[LOAD_OUT_VAL0]], off		; GCN: buffer_store_byte [[LOAD_OUT_VAL0]], off
; GCN: buffer_store_dword [[LOAD_OUT_VAL1]], off		; GCN: buffer_store_dword [[LOAD_OUT_VAL1]], off
define amdgpu_kernel void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {
%in.val = alloca { i8, i32 }, align 4, addrspace(5)		%in.val = alloca { i8, i32 }, align 4, addrspace(5)
%out.val = alloca { i8, i32 }, align 4, addrspace(5)		%out.val = alloca { i8, i32 }, align 4, addrspace(5)
%in.gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %in.val, i32 0, i32 0		%in.gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %in.val, i32 0, i32 0
▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-constant.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck -check-prefix=GCN %s

	; FIXME: Emitting unnecessary flat_scratch setup			; FIXME: Emitting unnecessary flat_scratch setup

	; GCN-LABEL: {{^}}test_call_undef:			; GCN-LABEL: {{^}}test_call_undef:
	; GCN: s_mov_b32 s8, s7
	; GCN: s_mov_b32 flat_scratch_lo, s5			; GCN: s_mov_b32 flat_scratch_lo, s5
	; GCN: s_add_u32 s4, s4, s8			; GCN: s_add_u32 s4, s4, s7
	; GCN: s_lshr_b32			; GCN: s_lshr_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @test_call_undef() #0 {			define amdgpu_kernel void @test_call_undef() #0 {
	%val = call i32 undef(i32 1)			%val = call i32 undef(i32 1)
	%op = add i32 %val, 1			%op = add i32 %val, 1
	store volatile i32 %op, i32 addrspace(1)* undef			store volatile i32 %op, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_tail_call_undef:			; GCN-LABEL: {{^}}test_tail_call_undef:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: .Lfunc_end			; GCN-NEXT: .Lfunc_end
	define i32 @test_tail_call_undef() #0 {			define i32 @test_tail_call_undef() #0 {
	%call = tail call i32 undef(i32 1)			%call = tail call i32 undef(i32 1)
	ret i32 %call			ret i32 %call
	}			}

	; GCN-LABEL: {{^}}test_call_null:			; GCN-LABEL: {{^}}test_call_null:
	; GCN: s_mov_b32 s8, s7
	; GCN: s_mov_b32 flat_scratch_lo, s5			; GCN: s_mov_b32 flat_scratch_lo, s5
	; GCN: s_add_u32 s4, s4, s8			; GCN: s_add_u32 s4, s4, s7
	; GCN: s_lshr_b32			; GCN: s_lshr_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @test_call_null() #0 {			define amdgpu_kernel void @test_call_null() #0 {
	%val = call i32 null(i32 1)			%val = call i32 null(i32 1)
	%op = add i32 %val, 1			%op = add i32 %val, 1
	store volatile i32 %op, i32 addrspace(1)* null			store volatile i32 %op, i32 addrspace(1)* null
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_tail_call_null:			; GCN-LABEL: {{^}}test_tail_call_null:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: .Lfunc_end			; GCN-NEXT: .Lfunc_end
	define i32 @test_tail_call_null() #0 {			define i32 @test_tail_call_null() #0 {
	%call = tail call i32 null(i32 1)			%call = tail call i32 null(i32 1)
	ret i32 %call			ret i32 %call
	}			}

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

declare hidden void @external_void_func_void() #0		declare hidden void @external_void_func_void() #0

; GCN-LABEL: {{^}}test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GCN-LABEL: {{^}}test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; GCN: s_mov_b32 s33, s7
; GCN: s_getpc_b64 s[34:35]		; GCN: s_getpc_b64 s[34:35]
; GCN-NEXT: s_add_u32 s34, s34,		; GCN-NEXT: s_add_u32 s34, s34,
; GCN-NEXT: s_addc_u32 s35, s35,		; GCN-NEXT: s_addc_u32 s35, s35,
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_mov_b32 s32, 0
; GCN: s_swappc_b64 s[30:31], s[34:35]		; GCN: s_swappc_b64 s[30:31], s[34:35]

; GCN-NEXT: #ASMSTART		; GCN-NEXT: #ASMSTART
; GCN-NEXT: #ASMEND		; GCN-NEXT: #ASMEND
; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]
define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {		define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_call_void_func_void_clobber_vcc(i32 addrspace(1)* %out) #0 {
call void @void_func_void_clobber_vcc()		call void @void_func_void_clobber_vcc()
%val0 = load volatile i32, i32 addrspace(1)* undef		%val0 = load volatile i32, i32 addrspace(1)* undef
%val1 = load volatile i32, i32 addrspace(1)* undef		%val1 = load volatile i32, i32 addrspace(1)* undef
call void asm sideeffect "; use $0", "{vcc}"(i64 %vcc)		call void asm sideeffect "; use $0", "{vcc}"(i64 %vcc)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_s31:		; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_s31:
; GCN: s_mov_b32 s34, s31		; GCN: s_mov_b32 s33, s31
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-NEXT: s_mov_b32 s31, s34		; GCN-NEXT: s_mov_b32 s31, s33
define amdgpu_kernel void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {
%s31 = call i32 asm sideeffect "; def $0", "={s31}"()		%s31 = call i32 asm sideeffect "; def $0", "={s31}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{s31}"(i32 %s31)		call void asm sideeffect "; use $0", "{s31}"(i32 %s31)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_v31:		; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_v31:
; GCN: v_mov_b32_e32 v32, v31		; GCN: v_mov_b32_e32 v32, v31
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-NEXT: v_mov_b32_e32 v31, v32		; GCN-NEXT: v_mov_b32_e32 v31, v32
define amdgpu_kernel void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
%v31 = call i32 asm sideeffect "; def $0", "={v31}"()		%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{v31}"(i32 %v31)		call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
ret void		ret void
}		}

; FIXME: What is the expected behavior for reserved registers here?

; GCN-LABEL: {{^}}test_call_void_func_void_preserves_s33:		; GCN-LABEL: {{^}}test_call_void_func_void_preserves_s33:
; GCN: s_mov_b32 s33, s9
; GCN: s_mov_b32 s32, s33
; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4		; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4
		; GCN: s_mov_b32 s32, 0
; GCN: #ASMSTART		; GCN: #ASMSTART
; GCN-NEXT: ; def s33		; GCN-NEXT: ; def s33
; GCN-NEXT: #ASMEND		; GCN-NEXT: #ASMEND
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; use s33		; GCN-NEXT: ; use s33
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NOT: s33		; GCN-NOT: s33
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {
%s33 = call i32 asm sideeffect "; def $0", "={s33}"()		%s33 = call i32 asm sideeffect "; def $0", "={s33}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{s33}"(i32 %s33)		call void asm sideeffect "; use $0", "{s33}"(i32 %s33)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_preserves_s34:		; FIXME: What is the expected behavior for reserved registers here?
; GCN: s_mov_b32 s33, s9
; GCN-NOT: s34		; GCN-LABEL: {{^}}test_call_void_func_void_preserves_s34: {{.*}}
; GCN-NOT: s34		; GCN-NOT: s34

; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4		; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4
		; GCN: s_mov_b32 s32, 0

; GCN-NOT: s34		; GCN-NOT: s34
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; def s34		; GCN-NEXT: ; def s34
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND

; GCN-NOT: s34		; GCN-NOT: s34
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]

; GCN-NOT: s34		; GCN-NOT: s34

; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s34		; GCN-NEXT: ; use s34
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {
%s34 = call i32 asm sideeffect "; def $0", "={s34}"()		%s34 = call i32 asm sideeffect "; def $0", "={s34}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{s34}"(i32 %s34)		call void asm sideeffect "; use $0", "{s34}"(i32 %s34)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_preserves_v32:		; GCN-LABEL: {{^}}test_call_void_func_void_preserves_v32: {{.*}}
; GCN: s_mov_b32 s33, s9

; GCN-NOT: v32		; GCN-NOT: v32
; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4		; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4
		; GCN: s_mov_b32 s32, 0
; GCN-NOT: v32		; GCN-NOT: v32
; GCN-DAG: s_mov_b32 s32, s33

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; def v32		; GCN-NEXT: ; def v32
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND

; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]

; GCN-NOT: v32		; GCN-NOT: v32
Show All 29 Lines
; GCN-NEXT: v_readlane_b32 s34, v0, 0		; GCN-NEXT: v_readlane_b32 s34, v0, 0
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define hidden void @void_func_void_clobber_s34() #2 {		define hidden void @void_func_void_clobber_s34() #2 {
call void asm sideeffect "; clobber", "~{s34}"() #0		call void asm sideeffect "; clobber", "~{s34}"() #0
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_clobber_s33:		; GCN-LABEL: {{^}}test_call_void_func_void_clobber_s33:
; GCN: s_mov_b32 s33, s7

; GCN: s_getpc_b64		; GCN: s_getpc_b64
; GCN-NEXT: s_add_u32		; GCN-NEXT: s_add_u32
; GCN-NEXT: s_addc_u32		; GCN-NEXT: s_addc_u32
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_clobber_s33() #0 {		define amdgpu_kernel void @test_call_void_func_void_clobber_s33() #0 {
call void @void_func_void_clobber_s33()		call void @void_func_void_clobber_s33()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_clobber_s34:		; GCN-LABEL: {{^}}test_call_void_func_void_clobber_s34:
; GCN: s_mov_b32 s33, s7
; GCN: s_getpc_b64		; GCN: s_getpc_b64
; GCN-NEXT: s_add_u32		; GCN-NEXT: s_add_u32
; GCN-NEXT: s_addc_u32		; GCN-NEXT: s_addc_u32
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_clobber_s34() #0 {		define amdgpu_kernel void @test_call_void_func_void_clobber_s34() #0 {
call void @void_func_void_clobber_s34()		call void @void_func_void_clobber_s34()
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_saved_sgpr_func:		; GCN-LABEL: {{^}}callee_saved_sgpr_func:
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-waitcnt.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

	; Load argument depends on waitcnt which should be skipped.			; Load argument depends on waitcnt which should be skipped.
	define amdgpu_kernel void @call_memory_arg_load(i32 addrspace(3)* %ptr, i32) #0 {			define amdgpu_kernel void @call_memory_arg_load(i32 addrspace(3)* %ptr, i32) #0 {
	; GCN-LABEL: call_memory_arg_load:			; GCN-LABEL: call_memory_arg_load:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dword s4, s[4:5], 0x0			; GCN-NEXT: s_load_dword s4, s[4:5], 0x0
	; GCN-NEXT: s_mov_b32 s33, s9			; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s4			; GCN-NEXT: v_mov_b32_e32 v0, s4
	; GCN-NEXT: ds_read_b32 v0, v0			; GCN-NEXT: ds_read_b32 v0, v0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4
				; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%vgpr = load volatile i32, i32 addrspace(3)* %ptr			%vgpr = load volatile i32, i32 addrspace(3)* %ptr
	call void @func(i32 %vgpr)			call void @func(i32 %vgpr)
	ret void			ret void
	}			}

	; Memory waitcnt with no register dependence on the call			; Memory waitcnt with no register dependence on the call
	define amdgpu_kernel void @call_memory_no_dep(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_memory_no_dep(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_memory_no_dep:			; GCN-LABEL: call_memory_no_dep:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_mov_b32 s33, s9			; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s4			; GCN-NEXT: v_mov_b32_e32 v0, s4
	; GCN-NEXT: v_mov_b32_e32 v1, s5			; GCN-NEXT: v_mov_b32_e32 v1, s5
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: global_store_dword v[0:1], v2, off			; GCN-NEXT: global_store_dword v[0:1], v2, off
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_getpc_b64 s[6:7]			; GCN-NEXT: s_getpc_b64 s[6:7]
	; GCN-NEXT: s_add_u32 s6, s6, func@rel32@lo+4			; GCN-NEXT: s_add_u32 s6, s6, func@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s7, s7, func@rel32@hi+4			; GCN-NEXT: s_addc_u32 s7, s7, func@rel32@hi+4
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GCN-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	store i32 0, i32 addrspace(1)* %ptr			store i32 0, i32 addrspace(1)* %ptr
	call void @func(i32 0)			call void @func(i32 0)
	ret void			ret void
	}			}

	; Should not wait after the call before memory			; Should not wait after the call before memory
	define amdgpu_kernel void @call_no_wait_after_call(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_no_wait_after_call(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_no_wait_after_call:			; GCN-LABEL: call_no_wait_after_call:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
				; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0
	; GCN-NEXT: s_mov_b32 s33, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: v_mov_b32_e32 v32, 0			; GCN-NEXT: v_mov_b32_e32 v32, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_mov_b32_e32 v0, s34			; GCN-NEXT: v_mov_b32_e32 v0, s34
	; GCN-NEXT: v_mov_b32_e32 v1, s35			; GCN-NEXT: v_mov_b32_e32 v1, s35
	; GCN-NEXT: global_store_dword v[0:1], v32, off			; GCN-NEXT: global_store_dword v[0:1], v32, off
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	call void @func(i32 0)			call void @func(i32 0)
	store i32 0, i32 addrspace(1)* %ptr			store i32 0, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @call_no_wait_after_call_return_val(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_no_wait_after_call_return_val(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_no_wait_after_call_return_val:			; GCN-LABEL: call_no_wait_after_call_return_val:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
				; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0
	; GCN-NEXT: s_mov_b32 s33, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func.return@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func.return@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func.return@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func.return@rel32@hi+4
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_mov_b32_e32 v1, s34			; GCN-NEXT: v_mov_b32_e32 v1, s34
	; GCN-NEXT: v_mov_b32_e32 v2, s35			; GCN-NEXT: v_mov_b32_e32 v2, s35
	; GCN-NEXT: global_store_dword v[1:2], v0, off			; GCN-NEXT: global_store_dword v[1:2], v0, off
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%rv = call i32 @func.return(i32 0)			%rv = call i32 @func.return(i32 0)
	store i32 %rv, i32 addrspace(1)* %ptr			store i32 %rv, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	; Need to wait for the address dependency			; Need to wait for the address dependency
	define amdgpu_kernel void @call_got_load(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_got_load(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_got_load:			; GCN-LABEL: call_got_load:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s33, s9			; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, got.func@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, got.func@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, got.func@gotpcrel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, got.func@gotpcrel32@hi+4
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	call void @got.func(i32 0)			call void @got.func(i32 0)
	ret void			ret void
	}			}

	; Need to wait for the address dependency			; Need to wait for the address dependency
	Show All 35 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
define hidden void @use_workgroup_id_yz() #1 {		define hidden void @use_workgroup_id_yz() #1 {
%val0 = call i32 @llvm.amdgcn.workgroup.id.y()		%val0 = call i32 @llvm.amdgcn.workgroup.id.y()
%val1 = call i32 @llvm.amdgcn.workgroup.id.z()		%val1 = call i32 @llvm.amdgcn.workgroup.id.z()
call void asm sideeffect "; use $0", "s"(i32 %val0)		call void asm sideeffect "; use $0", "s"(i32 %val0)
call void asm sideeffect "; use $0", "s"(i32 %val1)		call void asm sideeffect "; use $0", "s"(i32 %val1)
ret void		ret void
}		}

		; FIXME: Include use of scratch wave offset in these tests?

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_x:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_x:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN-NOT: s6		; GCN-NOT: s6
; GCN: s_mov_b32 s4, s6		; GCN: s_mov_b32 s4, s6
; GCN-NEXT: s_getpc_b64 s[6:7]		; GCN-NEXT: s_getpc_b64 s[6:7]
; GCN-NEXT: s_add_u32 s6, s6, use_workgroup_id_x@rel32@lo+4		; GCN-NEXT: s_add_u32 s6, s6, use_workgroup_id_x@rel32@lo+4
; GCN-NEXT: s_addc_u32 s7, s7, use_workgroup_id_x@rel32@hi+4		; GCN-NEXT: s_addc_u32 s7, s7, use_workgroup_id_x@rel32@hi+4
; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @kern_indirect_use_workgroup_id_x() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_x() #1 {
call void @use_workgroup_id_x()		call void @use_workgroup_id_x()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_y:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_y:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN: s_mov_b32 s33, s8		; GCN: s_mov_b32 s4, s7
; GCN-DAG: s_mov_b32 s4, s7		; GCN: s_mov_b32 s32, 0
; GCN: s_mov_b32 s32, s33
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_y() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_y() #1 {
call void @use_workgroup_id_y()		call void @use_workgroup_id_y()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_z:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_z:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1
; GCN: s_mov_b32 s33, s8
; GCN: s_mov_b32 s4, s7		; GCN: s_mov_b32 s4, s7

		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_z() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_z() #1 {
call void @use_workgroup_id_z()		call void @use_workgroup_id_z()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xy:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xy:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN: s_mov_b32 s33, s8

; GCN: s_mov_b32 s5, s7		; GCN: s_mov_b32 s5, s7
; GCN: s_mov_b32 s4, s6		; GCN: s_mov_b32 s4, s6
; GCN: s_mov_b32 s32, s33
		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_xy() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_xy() #1 {
call void @use_workgroup_id_xy()		call void @use_workgroup_id_xy()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xyz:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xyz:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1

; GCN: s_mov_b32 s33, s9

; GCN: s_mov_b32 s4, s6		; GCN: s_mov_b32 s4, s6
; GCN: s_mov_b32 s5, s7		; GCN: s_mov_b32 s5, s7
; GCN: s_mov_b32 s6, s8		; GCN: s_mov_b32 s6, s8

; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_xyz() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_xyz() #1 {
call void @use_workgroup_id_xyz()		call void @use_workgroup_id_xyz()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xz:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xz:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1

; GCN: s_mov_b32 s33, s8
; GCN: s_mov_b32 s5, s7		; GCN: s_mov_b32 s5, s7
; GCN: s_mov_b32 s4, s6		; GCN: s_mov_b32 s4, s6

; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0

; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_xz() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_xz() #1 {
call void @use_workgroup_id_xz()		call void @use_workgroup_id_xz()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_yz:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_yz:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1

; GCN: s_mov_b32 s33, s9
; GCN: s_mov_b32 s4, s7		; GCN: s_mov_b32 s4, s7
; GCN: s_mov_b32 s5, s8		; GCN: s_mov_b32 s5, s8
; GCN: s_mov_b32 s32, s33
		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_yz() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_yz() #1 {
call void @use_workgroup_id_yz()		call void @use_workgroup_id_yz()
ret void		ret void
}		}

; Argument is in right place already		; Argument is in right place already
; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:		; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	define hidden void @other_arg_use_workgroup_id_z(i32 %arg0) #1 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_x:		; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_x:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN-DAG: s_mov_b32 s33, s7
; GCN-DAG: v_mov_b32_e32 v0, 0x22b		; GCN-DAG: v_mov_b32_e32 v0, 0x22b
; GCN-DAG: s_mov_b32 s4, s6		; GCN-DAG: s_mov_b32 s4, s6
; GCN-DAG: s_mov_b32 s32, s33
		; GCN-DAG: s_mov_b32 s32, 0
; GCN-NOT: s4		; GCN-NOT: s4
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_x() #1 {		define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_x() #1 {
call void @other_arg_use_workgroup_id_x(i32 555)		call void @other_arg_use_workgroup_id_x(i32 555)
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_y:		; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_y:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN-DAG: s_mov_b32 s33, s8
; GCN-DAG: v_mov_b32_e32 v0, 0x22b		; GCN-DAG: v_mov_b32_e32 v0, 0x22b
; GCN-DAG: s_mov_b32 s4, s7		; GCN-DAG: s_mov_b32 s4, s7

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_y() #1 {		define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_y() #1 {
call void @other_arg_use_workgroup_id_y(i32 555)		call void @other_arg_use_workgroup_id_y(i32 555)
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_z:		; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_z:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1

; GCN-DAG: s_mov_b32 s33, s8
; GCN-DAG: v_mov_b32_e32 v0, 0x22b		; GCN-DAG: v_mov_b32_e32 v0, 0x22b

; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_z() #1 {		define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_z() #1 {
call void @other_arg_use_workgroup_id_z(i32 555)		call void @other_arg_use_workgroup_id_z(i32 555)
ret void		ret void
}		}

; GCN-LABEL: {{^}}use_every_sgpr_input:		; GCN-LABEL: {{^}}use_every_sgpr_input:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32{{$}}
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines

; GCN: enable_sgpr_private_segment_buffer = 1		; GCN: enable_sgpr_private_segment_buffer = 1
; GCN: enable_sgpr_dispatch_ptr = 1		; GCN: enable_sgpr_dispatch_ptr = 1
; GCN: enable_sgpr_queue_ptr = 1		; GCN: enable_sgpr_queue_ptr = 1
; GCN: enable_sgpr_kernarg_segment_ptr = 1		; GCN: enable_sgpr_kernarg_segment_ptr = 1
; GCN: enable_sgpr_dispatch_id = 1		; GCN: enable_sgpr_dispatch_id = 1
; GCN: enable_sgpr_flat_scratch_init = 1		; GCN: enable_sgpr_flat_scratch_init = 1

; GCN: s_mov_b32 s33, s17
; GCN: s_mov_b32 s12, s14		; GCN: s_mov_b32 s12, s14
; GCN: s_mov_b32 s13, s15		; GCN: s_mov_b32 s13, s15
; GCN: s_mov_b32 s14, s16		; GCN: s_mov_b32 s14, s16
; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_every_sgpr_input() #1 {		define amdgpu_kernel void @kern_indirect_use_every_sgpr_input() #1 {
call void @use_every_sgpr_input()		call void @use_every_sgpr_input()
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_indirect_use_every_sgpr_input:		; GCN-LABEL: {{^}}func_indirect_use_every_sgpr_input:
; GCN-NOT: s6		; GCN-NOT: s6
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

Show First 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	define void @too_many_args_use_workitem_id_x(
store volatile i32 %arg31, i32 addrspace(1)* undef		store volatile i32 %arg31, i32 addrspace(1)* undef

ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x:		; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x:
; GCN: enable_vgpr_workitem_id = 0		; GCN: enable_vgpr_workitem_id = 0

; GCN: s_mov_b32 s33, s7		; GCN: s_mov_b32 s32, 0
; GCN: s_mov_b32 s32, s33
; GCN: buffer_store_dword v0, off, s[0:3], s32{{$}}		; GCN: buffer_store_dword v0, off, s[0:3], s32{{$}}
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_x() #1 {		define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_x() #1 {
call void @too_many_args_use_workitem_id_x(		call void @too_many_args_use_workitem_id_x(
i32 10, i32 20, i32 30, i32 40,		i32 10, i32 20, i32 30, i32 40,
i32 50, i32 60, i32 70, i32 80,		i32 50, i32 60, i32 70, i32 80,
i32 90, i32 100, i32 110, i32 120,		i32 90, i32 100, i32 110, i32 120,
i32 130, i32 140, i32 150, i32 160,		i32 130, i32 140, i32 150, i32 160,
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
}		}

; sp[0] = byval		; sp[0] = byval
; sp[1] = ??		; sp[1] = ??
; sp[2] = stack passed workitem ID x		; sp[2] = stack passed workitem ID x

; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x_byval:		; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x_byval:
; GCN: enable_vgpr_workitem_id = 0		; GCN: enable_vgpr_workitem_id = 0
; GCN-DAG: s_mov_b32 s33, s7		; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}
; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}		; GCN: s_movk_i32 s32, 0x400{{$}}
; GCN: buffer_store_dword [[K]], off, s[0:3], s33 offset:4		; GCN: buffer_store_dword [[K]], off, s[0:3], 0 offset:4
; GCN: buffer_load_dword [[RELOAD_BYVAL:v[0-9]+]], off, s[0:3], s33 offset:4		; GCN: buffer_store_dword v0, off, s[0:3], s32 offset:4
; GCN: s_add_u32 s32, s33, 0x400{{$}}		; GCN: buffer_load_dword [[RELOAD_BYVAL:v[0-9]+]], off, s[0:3], 0 offset:4

; GCN-NOT: s32		; GCN-NOT: s32
; GCN: buffer_store_dword v0, off, s[0:3], s32 offset:4

; GCN: buffer_store_dword [[RELOAD_BYVAL]], off, s[0:3], s32{{$}}		; GCN: buffer_store_dword [[RELOAD_BYVAL]], off, s[0:3], s32{{$}}
; GCN: v_mov_b32_e32 [[RELOAD_BYVAL]],		; GCN: v_mov_b32_e32 [[RELOAD_BYVAL]],
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_x_byval() #1 {		define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_x_byval() #1 {
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 999, i32 addrspace(5)* %alloca		store volatile i32 999, i32 addrspace(5)* %alloca
call void @too_many_args_use_workitem_id_x_byval(		call void @too_many_args_use_workitem_id_x_byval(
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	define void @too_many_args_use_workitem_id_xyz(
ret void		ret void
}		}

; frame[0] = ID { Z, Y, X }		; frame[0] = ID { Z, Y, X }

; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_xyz:		; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_xyz:
; GCN: enable_vgpr_workitem_id = 2		; GCN: enable_vgpr_workitem_id = 2

; GCN-DAG: s_mov_b32 s33, s7		; GCN-DAG: s_mov_b32 s32, 0
; GCN-DAG: s_mov_b32 s32, s33

; GCN-DAG: v_lshlrev_b32_e32 v1, 10, v1		; GCN-DAG: v_lshlrev_b32_e32 v1, 10, v1
; GCN-DAG: v_or_b32_e32 v0, v0, v1		; GCN-DAG: v_or_b32_e32 v0, v0, v1
; GCN-DAG: v_lshlrev_b32_e32 v2, 20, v2		; GCN-DAG: v_lshlrev_b32_e32 v2, 20, v2
; GCN-DAG: v_or_b32_e32 v0, v0, v2		; GCN-DAG: v_or_b32_e32 v0, v0, v2
; GCN: buffer_store_dword v0, off, s[0:3], s32{{$}}		; GCN: buffer_store_dword v0, off, s[0:3], s32{{$}}
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_xyz() #1 {		define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_xyz() #1 {
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	define void @too_many_args_use_workitem_id_x_stack_yz(
store volatile i32 %arg30, i32 addrspace(1)* undef		store volatile i32 %arg30, i32 addrspace(1)* undef

ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x_stack_yz:		; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x_stack_yz:
; GCN: enable_vgpr_workitem_id = 2		; GCN: enable_vgpr_workitem_id = 2

; GCN: s_mov_b32 s33, s7

; GCN-NOT: v0		; GCN-NOT: v0
; GCN-DAG: v_lshlrev_b32_e32 v1, 10, v1		; GCN-DAG: v_lshlrev_b32_e32 v1, 10, v1
; GCN-DAG: v_or_b32_e32 v0, v0, v1		; GCN-DAG: v_or_b32_e32 v0, v0, v1
; GCN-DAG: v_lshlrev_b32_e32 v2, 20, v2		; GCN-DAG: v_lshlrev_b32_e32 v2, 20, v2
; GCN-DAG: v_or_b32_e32 v31, v0, v2		; GCN-DAG: v_or_b32_e32 v31, v0, v2

; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_x_stack_yz() #1 {		define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_x_stack_yz() #1 {
call void @too_many_args_use_workitem_id_x_stack_yz(		call void @too_many_args_use_workitem_id_x_stack_yz(
i32 10, i32 20, i32 30, i32 40,		i32 10, i32 20, i32 30, i32 40,
i32 50, i32 60, i32 70, i32 80,		i32 50, i32 60, i32 70, i32 80,
i32 90, i32 100, i32 110, i32 120,		i32 90, i32 100, i32 110, i32 120,
i32 130, i32 140, i32 150, i32 160,		i32 130, i32 140, i32 150, i32 160,
i32 170, i32 180, i32 190, i32 200,		i32 170, i32 180, i32 190, i32 200,
Show All 12 Lines

llvm/test/CodeGen/AMDGPU/captured-frame-index.ll

Show All 22 Lines	define amdgpu_kernel void @stored_fi_to_lds(float addrspace(5)* addrspace(3)* %ptr) #0 {
store float 4.0, float addrspace(5)*%tmp		store float 4.0, float addrspace(5)*%tmp
store float addrspace(5)* %tmp, float addrspace(5)* addrspace(3)* %ptr		store float addrspace(5)* %tmp, float addrspace(5)* addrspace(3)* %ptr
ret void		ret void
}		}

; Offset is applied		; Offset is applied
; GCN-LABEL: {{^}}stored_fi_to_lds_2_small_objects:		; GCN-LABEL: {{^}}stored_fi_to_lds_2_small_objects:
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 4{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 4{{$}}
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:8{{$}}		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8{{$}}

; GCN-DAG: s_load_dword [[LDSPTR:s[0-9]+]]		; GCN-DAG: s_load_dword [[LDSPTR:s[0-9]+]]

; GCN-DAG: v_mov_b32_e32 [[VLDSPTR:v[0-9]+]], [[LDSPTR]]		; GCN-DAG: v_mov_b32_e32 [[VLDSPTR:v[0-9]+]], [[LDSPTR]]
; GCN: ds_write_b32 [[VLDSPTR]], [[ZERO]]		; GCN: ds_write_b32 [[VLDSPTR]], [[ZERO]]

; GCN-DAG: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}		; GCN-DAG: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}
; GCN: ds_write_b32 [[VLDSPTR]], [[FI1]]		; GCN: ds_write_b32 [[VLDSPTR]], [[FI1]]
define amdgpu_kernel void @stored_fi_to_lds_2_small_objects(float addrspace(5)* addrspace(3)* %ptr) #0 {		define amdgpu_kernel void @stored_fi_to_lds_2_small_objects(float addrspace(5)* addrspace(3)* %ptr) #0 {
%tmp0 = alloca float, addrspace(5)		%tmp0 = alloca float, addrspace(5)
%tmp1 = alloca float, addrspace(5)		%tmp1 = alloca float, addrspace(5)
store float 4.0, float addrspace(5)* %tmp0		store float 4.0, float addrspace(5)* %tmp0
store float 4.0, float addrspace(5)* %tmp1		store float 4.0, float addrspace(5)* %tmp1
store volatile float addrspace(5)* %tmp0, float addrspace(5)* addrspace(3)* %ptr		store volatile float addrspace(5)* %tmp0, float addrspace(5)* addrspace(3)* %ptr
store volatile float addrspace(5)* %tmp1, float addrspace(5)* addrspace(3)* %ptr		store volatile float addrspace(5)* %tmp1, float addrspace(5)* addrspace(3)* %ptr
ret void		ret void
}		}

; Same frame index is used multiple times in the store		; Same frame index is used multiple times in the store
; GCN-LABEL: {{^}}stored_fi_to_self:		; GCN-LABEL: {{^}}stored_fi_to_self:
; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x4d2{{$}}		; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x4d2{{$}}
; GCN: buffer_store_dword [[K]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword [[K]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 4{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 4{{$}}
; GCN: buffer_store_dword [[ZERO]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword [[ZERO]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
define amdgpu_kernel void @stored_fi_to_self() #0 {		define amdgpu_kernel void @stored_fi_to_self() #0 {
%tmp = alloca i32 addrspace(5)*, addrspace(5)		%tmp = alloca i32 addrspace(5)*, addrspace(5)

; Avoid optimizing everything out		; Avoid optimizing everything out
store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp		store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp
%bitcast = bitcast i32 addrspace(5)* addrspace(5)* %tmp to i32 addrspace(5)*		%bitcast = bitcast i32 addrspace(5)* addrspace(5)* %tmp to i32 addrspace(5)*
store volatile i32 addrspace(5)* %bitcast, i32 addrspace(5)* addrspace(5)* %tmp		store volatile i32 addrspace(5)* %bitcast, i32 addrspace(5)* addrspace(5)* %tmp
ret void		ret void
}		}

; GCN-LABEL: {{^}}stored_fi_to_self_offset:		; GCN-LABEL: {{^}}stored_fi_to_self_offset:
; GCN-DAG: v_mov_b32_e32 [[K0:v[0-9]+]], 32{{$}}		; GCN-DAG: v_mov_b32_e32 [[K0:v[0-9]+]], 32{{$}}
; GCN: buffer_store_dword [[K0]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword [[K0]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}

; GCN-DAG: v_mov_b32_e32 [[K1:v[0-9]+]], 0x4d2{{$}}		; GCN-DAG: v_mov_b32_e32 [[K1:v[0-9]+]], 0x4d2{{$}}
; GCN: buffer_store_dword [[K1]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:2052{{$}}		; GCN: buffer_store_dword [[K1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2052{{$}}

; GCN: v_mov_b32_e32 [[OFFSETK:v[0-9]+]], 0x804{{$}}		; GCN: v_mov_b32_e32 [[OFFSETK:v[0-9]+]], 0x804{{$}}
; GCN: buffer_store_dword [[OFFSETK]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:2052{{$}}		; GCN: buffer_store_dword [[OFFSETK]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2052{{$}}
define amdgpu_kernel void @stored_fi_to_self_offset() #0 {		define amdgpu_kernel void @stored_fi_to_self_offset() #0 {
%tmp0 = alloca [512 x i32], addrspace(5)		%tmp0 = alloca [512 x i32], addrspace(5)
%tmp1 = alloca i32 addrspace(5)*, addrspace(5)		%tmp1 = alloca i32 addrspace(5)*, addrspace(5)

; Avoid optimizing everything out		; Avoid optimizing everything out
%tmp0.cast = bitcast [512 x i32] addrspace(5)* %tmp0 to i32 addrspace(5)*		%tmp0.cast = bitcast [512 x i32] addrspace(5)* %tmp0 to i32 addrspace(5)*
store volatile i32 32, i32 addrspace(5)* %tmp0.cast		store volatile i32 32, i32 addrspace(5)* %tmp0.cast

store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp1		store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp1

%bitcast = bitcast i32 addrspace(5)* addrspace(5)* %tmp1 to i32 addrspace(5)*		%bitcast = bitcast i32 addrspace(5)* addrspace(5)* %tmp1 to i32 addrspace(5)*
store volatile i32 addrspace(5)* %bitcast, i32 addrspace(5)* addrspace(5)* %tmp1		store volatile i32 addrspace(5)* %bitcast, i32 addrspace(5)* addrspace(5)* %tmp1
ret void		ret void
}		}

; GCN-LABEL: {{^}}stored_fi_to_fi:		; GCN-LABEL: {{^}}stored_fi_to_fi:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:8{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12{{$}}

; GCN: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}		; GCN: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}
; GCN: buffer_store_dword [[FI1]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12{{$}}		; GCN: buffer_store_dword [[FI1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12{{$}}

; GCN: v_mov_b32_e32 [[FI2:v[0-9]+]], 12{{$}}		; GCN: v_mov_b32_e32 [[FI2:v[0-9]+]], 12{{$}}
; GCN: buffer_store_dword [[FI2]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:8{{$}}		; GCN: buffer_store_dword [[FI2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8{{$}}
define amdgpu_kernel void @stored_fi_to_fi() #0 {		define amdgpu_kernel void @stored_fi_to_fi() #0 {
%tmp0 = alloca i32 addrspace(5)*, addrspace(5)		%tmp0 = alloca i32 addrspace(5)*, addrspace(5)
%tmp1 = alloca i32 addrspace(5)*, addrspace(5)		%tmp1 = alloca i32 addrspace(5)*, addrspace(5)
%tmp2 = alloca i32 addrspace(5)*, addrspace(5)		%tmp2 = alloca i32 addrspace(5)*, addrspace(5)
store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp0		store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp0
store volatile i32 addrspace(5)* inttoptr (i32 5678 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp1		store volatile i32 addrspace(5)* inttoptr (i32 5678 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp1
store volatile i32 addrspace(5)* inttoptr (i32 9999 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp2		store volatile i32 addrspace(5)* inttoptr (i32 9999 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp2

%bitcast1 = bitcast i32 addrspace(5)* addrspace(5)* %tmp1 to i32 addrspace(5)*		%bitcast1 = bitcast i32 addrspace(5)* addrspace(5)* %tmp1 to i32 addrspace(5)*
%bitcast2 = bitcast i32 addrspace(5)* addrspace(5)* %tmp2 to i32 addrspace(5)* ; at offset 8		%bitcast2 = bitcast i32 addrspace(5)* addrspace(5)* %tmp2 to i32 addrspace(5)* ; at offset 8

store volatile i32 addrspace(5)* %bitcast1, i32 addrspace(5)* addrspace(5)* %tmp2 ; store offset 4 at offset 8		store volatile i32 addrspace(5)* %bitcast1, i32 addrspace(5)* addrspace(5)* %tmp2 ; store offset 4 at offset 8
store volatile i32 addrspace(5)* %bitcast2, i32 addrspace(5)* addrspace(5)* %tmp1 ; store offset 8 at offset 4		store volatile i32 addrspace(5)* %bitcast2, i32 addrspace(5)* addrspace(5)* %tmp1 ; store offset 8 at offset 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}stored_fi_to_global:		; GCN-LABEL: {{^}}stored_fi_to_global:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 4{{$}}		; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 4{{$}}
; GCN: buffer_store_dword [[FI]]		; GCN: buffer_store_dword [[FI]]
define amdgpu_kernel void @stored_fi_to_global(float addrspace(5)* addrspace(1)* %ptr) #0 {		define amdgpu_kernel void @stored_fi_to_global(float addrspace(5)* addrspace(1)* %ptr) #0 {
%tmp = alloca float, addrspace(5)		%tmp = alloca float, addrspace(5)
store float 0.0, float addrspace(5)*%tmp		store float 0.0, float addrspace(5)*%tmp
store float addrspace(5)* %tmp, float addrspace(5)* addrspace(1)* %ptr		store float addrspace(5)* %tmp, float addrspace(5)* addrspace(1)* %ptr
ret void		ret void
}		}

; Offset is applied		; Offset is applied
; GCN-LABEL: {{^}}stored_fi_to_global_2_small_objects:		; GCN-LABEL: {{^}}stored_fi_to_global_2_small_objects:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:8{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12{{$}}

; GCN: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}		; GCN: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}
; GCN: buffer_store_dword [[FI1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}		; GCN: buffer_store_dword [[FI1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}

; GCN-DAG: v_mov_b32_e32 [[FI2:v[0-9]+]], 12{{$}}		; GCN-DAG: v_mov_b32_e32 [[FI2:v[0-9]+]], 12{{$}}
; GCN: buffer_store_dword [[FI2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}		; GCN: buffer_store_dword [[FI2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
define amdgpu_kernel void @stored_fi_to_global_2_small_objects(float addrspace(5)* addrspace(1)* %ptr) #0 {		define amdgpu_kernel void @stored_fi_to_global_2_small_objects(float addrspace(5)* addrspace(1)* %ptr) #0 {
%tmp0 = alloca float, addrspace(5)		%tmp0 = alloca float, addrspace(5)
%tmp1 = alloca float, addrspace(5)		%tmp1 = alloca float, addrspace(5)
%tmp2 = alloca float, addrspace(5)		%tmp2 = alloca float, addrspace(5)
store volatile float 0.0, float addrspace(5)*%tmp0		store volatile float 0.0, float addrspace(5)*%tmp0
store volatile float 0.0, float addrspace(5)*%tmp1		store volatile float 0.0, float addrspace(5)*%tmp1
store volatile float 0.0, float addrspace(5)*%tmp2		store volatile float 0.0, float addrspace(5)*%tmp2
store volatile float addrspace(5)* %tmp1, float addrspace(5)* addrspace(1)* %ptr		store volatile float addrspace(5)* %tmp1, float addrspace(5)* addrspace(1)* %ptr
store volatile float addrspace(5)* %tmp2, float addrspace(5)* addrspace(1)* %ptr		store volatile float addrspace(5)* %tmp2, float addrspace(5)* addrspace(1)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}stored_fi_to_global_huge_frame_offset:		; GCN-LABEL: {{^}}stored_fi_to_global_huge_frame_offset:
; GCN: v_mov_b32_e32 [[BASE_0:v[0-9]+]], 0{{$}}		; GCN: v_mov_b32_e32 [[BASE_0:v[0-9]+]], 0{{$}}
; GCN: buffer_store_dword [[BASE_0]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword [[BASE_0]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}

; FIXME: Re-initialize		; FIXME: Re-initialize
; GCN: v_mov_b32_e32 [[BASE_0_1:v[0-9]+]], 4{{$}}		; GCN: v_mov_b32_e32 [[BASE_0_1:v[0-9]+]], 4{{$}}

; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}		; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}
; GCN-DAG: v_add_i32_e32 [[BASE_1_OFF_1:v[0-9]+]], vcc, 0x3ffc, [[BASE_0_1]]		; GCN-DAG: v_add_i32_e32 [[BASE_1_OFF_1:v[0-9]+]], vcc, 0x3ffc, [[BASE_0_1]]


; GCN: v_add_i32_e32 [[BASE_1_OFF_2:v[0-9]+]], vcc, 56, [[BASE_0_1]]		; GCN: v_add_i32_e32 [[BASE_1_OFF_2:v[0-9]+]], vcc, 56, [[BASE_0_1]]
; GCN: buffer_store_dword [[K]], [[BASE_1_OFF_1]], s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword [[K]], [[BASE_1_OFF_1]], s{{\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}

; GCN: buffer_store_dword [[BASE_1_OFF_2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}		; GCN: buffer_store_dword [[BASE_1_OFF_2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
define amdgpu_kernel void @stored_fi_to_global_huge_frame_offset(i32 addrspace(5)* addrspace(1)* %ptr) #0 {		define amdgpu_kernel void @stored_fi_to_global_huge_frame_offset(i32 addrspace(5)* addrspace(1)* %ptr) #0 {
%tmp0 = alloca [4096 x i32], addrspace(5)		%tmp0 = alloca [4096 x i32], addrspace(5)
%tmp1 = alloca [4096 x i32], addrspace(5)		%tmp1 = alloca [4096 x i32], addrspace(5)
%gep0.tmp0 = getelementptr [4096 x i32], [4096 x i32] addrspace(5)* %tmp0, i32 0, i32 0		%gep0.tmp0 = getelementptr [4096 x i32], [4096 x i32] addrspace(5)* %tmp0, i32 0, i32 0
store volatile i32 0, i32 addrspace(5)* %gep0.tmp0		store volatile i32 0, i32 addrspace(5)* %gep0.tmp0
%gep1.tmp0 = getelementptr [4096 x i32], [4096 x i32] addrspace(5)* %tmp0, i32 0, i32 4095		%gep1.tmp0 = getelementptr [4096 x i32], [4096 x i32] addrspace(5)* %tmp0, i32 0, i32 4095
Show All 32 Lines

llvm/test/CodeGen/AMDGPU/cc-update.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 < %s \| FileCheck --check-prefix=GFX803 %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck --check-prefix=GFX900 %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 < %s \| FileCheck --check-prefix=GFX1010 %s

				define amdgpu_kernel void @test_kern_empty() local_unnamed_addr #0 {
				; GFX803-LABEL: test_kern_empty:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_kern_empty:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_kern_empty:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_endpgm
				entry:
				ret void
				}

				define amdgpu_kernel void @test_kern_stack() local_unnamed_addr #0 {
				; GFX803-LABEL: test_kern_stack:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: v_mov_b32_e32 v0, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_kern_stack:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: v_mov_b32_e32 v0, 0
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_kern_stack:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: v_mov_b32_e32 v0, 0
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX1010-NEXT: s_endpgm
				entry:
				%x = alloca i32, align 4, addrspace(5)
				store volatile i32 0, i32 addrspace(5)* %x, align 4
				ret void
				}

				define amdgpu_kernel void @test_kern_call() local_unnamed_addr #0 {
				; GFX803-LABEL: test_kern_call:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: s_getpc_b64 s[4:5]
				; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX803-NEXT: s_mov_b32 s32, 0
				; GFX803-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_kern_call:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: s_getpc_b64 s[4:5]
				; GFX900-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX900-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX900-NEXT: s_mov_b32 s32, 0
				; GFX900-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_kern_call:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_mov_b32 s32, 0
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: s_getpc_b64 s[4:5]
				; GFX1010-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX1010-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX1010-NEXT: s_endpgm
				entry:
				tail call void @ex() #0
				ret void
				}

				define amdgpu_kernel void @test_kern_stack_and_call() local_unnamed_addr #0 {
				; GFX803-LABEL: test_kern_stack_and_call:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: v_mov_b32_e32 v0, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: s_getpc_b64 s[4:5]
				; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX803-NEXT: s_movk_i32 s32, 0x400
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX803-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_kern_stack_and_call:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: v_mov_b32_e32 v0, 0
				; GFX900-NEXT: s_getpc_b64 s[4:5]
				; GFX900-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX900-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX900-NEXT: s_movk_i32 s32, 0x400
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX900-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_kern_stack_and_call:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_movk_i32 s32, 0x200
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: v_mov_b32_e32 v0, 0
				; GFX1010-NEXT: s_getpc_b64 s[4:5]
				; GFX1010-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX1010-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX1010-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX1010-NEXT: s_endpgm
				entry:
				%x = alloca i32, align 4, addrspace(5)
				store volatile i32 0, i32 addrspace(5)* %x, align 4
				tail call void @ex() #0
				ret void
				}

				define amdgpu_kernel void @test_sgpr_offset_kernel() #1 {
				; GFX803-LABEL: test_sgpr_offset_kernel:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8
				; GFX803-NEXT: s_mov_b32 s4, 0x40000
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: s_waitcnt vmcnt(0)
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill
				; GFX803-NEXT: ;;#ASMSTART
				; GFX803-NEXT: ;;#ASMEND
				; GFX803-NEXT: s_mov_b32 s4, 0x40000
				; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
				; GFX803-NEXT: s_waitcnt vmcnt(0)
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_sgpr_offset_kernel:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8
				; GFX900-NEXT: s_mov_b32 s6, 0x40000
				; GFX900-NEXT: s_waitcnt vmcnt(0)
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
				; GFX900-NEXT: ;;#ASMSTART
				; GFX900-NEXT: ;;#ASMEND
				; GFX900-NEXT: s_mov_b32 s6, 0x40000
				; GFX900-NEXT: buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
				; GFX900-NEXT: s_waitcnt vmcnt(0)
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_sgpr_offset_kernel:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: s_mov_b32 s6, 0x20000
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8
				; GFX1010-NEXT: s_waitcnt vmcnt(0)
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
				; GFX1010-NEXT: v_nop
				; GFX1010-NEXT: s_mov_b32 s6, 0x20000
				; GFX1010-NEXT: ;;#ASMSTART
				; GFX1010-NEXT: ;;#ASMEND
				; GFX1010-NEXT: buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
				; GFX1010-NEXT: s_waitcnt vmcnt(0)
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
				; GFX1010-NEXT: s_endpgm
				entry:
				; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
				; fit in the instruction, and has to live in the SGPR offset.
				%alloca = alloca i8, i32 4092, align 4, addrspace(5)
				%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

				%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
				; 0x40000 / 64 = 4096 (for wave64)
				; CHECK: s_add_u32 s6, s7, 0x40000
				; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill
				%a = load volatile i32, i32 addrspace(5)* %aptr

				; Force %a to spill
				call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

				%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
				store volatile i32 %a, i32 addrspace(5)* %outptr

				ret void
				}

				declare hidden void @ex() local_unnamed_addr #0

				attributes #0 = { nounwind }
				attributes #1 = { nounwind "amdgpu-num-vgpr"="8" }

llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines

	; OPT-LABEL: @test_sink_scratch_small_offset_i32(			; OPT-LABEL: @test_sink_scratch_small_offset_i32(
	; OPT-NOT: getelementptr [512 x i32]			; OPT-NOT: getelementptr [512 x i32]
	; OPT: br i1			; OPT: br i1
	; OPT: getelementptr i8,			; OPT: getelementptr i8,

	; GCN-LABEL: {{^}}test_sink_scratch_small_offset_i32:			; GCN-LABEL: {{^}}test_sink_scratch_small_offset_i32:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offset:4092{{$}}			; GCN: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:4092{{$}}
	; GCN: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offset:4092{{$}}			; GCN: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:4092{{$}}
	; GCN: {{^}}BB4_2:			; GCN: {{^}}BB4_2:
	define amdgpu_kernel void @test_sink_scratch_small_offset_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {			define amdgpu_kernel void @test_sink_scratch_small_offset_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {
	entry:			entry:
	%alloca = alloca [512 x i32], align 4, addrspace(5)			%alloca = alloca [512 x i32], align 4, addrspace(5)
	%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998			%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998
	%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999			%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999
	%add.arg = add i32 %arg, 8			%add.arg = add i32 %arg, 8
	%alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1022			%alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1022
	Show All 21 Lines
	; OPT-LABEL: @test_sink_scratch_small_offset_i32_reserved(			; OPT-LABEL: @test_sink_scratch_small_offset_i32_reserved(
	; OPT-NOT: getelementptr [512 x i32]			; OPT-NOT: getelementptr [512 x i32]
	; OPT: br i1			; OPT: br i1
	; OPT: getelementptr i8,			; OPT: getelementptr i8,

	; GCN-LABEL: {{^}}test_sink_scratch_small_offset_i32_reserved:			; GCN-LABEL: {{^}}test_sink_scratch_small_offset_i32_reserved:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN: v_mov_b32_e32 [[BASE_FI0:v[0-9]+]], 4			; GCN: v_mov_b32_e32 [[BASE_FI0:v[0-9]+]], 4
	; GCN: buffer_store_dword {{v[0-9]+}}, [[BASE_FI0]], {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen offset:4092{{$}}			; GCN: buffer_store_dword {{v[0-9]+}}, [[BASE_FI0]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen offset:4092{{$}}
	; GCN: v_mov_b32_e32 [[BASE_FI1:v[0-9]+]], 4			; GCN: v_mov_b32_e32 [[BASE_FI1:v[0-9]+]], 4
	; GCN: buffer_load_dword {{v[0-9]+}}, [[BASE_FI1]], {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen offset:4092{{$}}			; GCN: buffer_load_dword {{v[0-9]+}}, [[BASE_FI1]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen offset:4092{{$}}
	; GCN: {{^BB[0-9]+}}_2:			; GCN: {{^BB[0-9]+}}_2:

	define amdgpu_kernel void @test_sink_scratch_small_offset_i32_reserved(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {			define amdgpu_kernel void @test_sink_scratch_small_offset_i32_reserved(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {
	entry:			entry:
	%alloca = alloca [512 x i32], align 4, addrspace(5)			%alloca = alloca [512 x i32], align 4, addrspace(5)
	%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998			%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998
	%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999			%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999
	%add.arg = add i32 %arg, 8			%add.arg = add i32 %arg, 8
	Show All 20 Lines

	; OPT-LABEL: @test_no_sink_scratch_large_offset_i32(			; OPT-LABEL: @test_no_sink_scratch_large_offset_i32(
	; OPT: %alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1024			; OPT: %alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1024
	; OPT: br i1			; OPT: br i1
	; OPT-NOT: ptrtoint			; OPT-NOT: ptrtoint

	; GCN-LABEL: {{^}}test_no_sink_scratch_large_offset_i32:			; GCN-LABEL: {{^}}test_no_sink_scratch_large_offset_i32:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen{{$}}			; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen{{$}}			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}
	; GCN: {{^BB[0-9]+}}_2:			; GCN: {{^BB[0-9]+}}_2:
	define amdgpu_kernel void @test_no_sink_scratch_large_offset_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {			define amdgpu_kernel void @test_no_sink_scratch_large_offset_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {
	entry:			entry:
	%alloca = alloca [512 x i32], align 4, addrspace(5)			%alloca = alloca [512 x i32], align 4, addrspace(5)
	%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998			%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998
	%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999			%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999
	%add.arg = add i32 %arg, 8			%add.arg = add i32 %arg, 8
	%alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1024			%alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1024
	▲ Show 20 Lines • Show All 532 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s

define <2 x half> @chain_hi_to_lo_private() {		define <2 x half> @chain_hi_to_lo_private() {
; GCN-LABEL: chain_hi_to_lo_private:		; GCN-LABEL: chain_hi_to_lo_private:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:2		; GCN-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], s33		; GCN-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], 0
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%gep_lo = getelementptr inbounds half, half addrspace(5)* null, i64 1		%gep_lo = getelementptr inbounds half, half addrspace(5)* null, i64 1
%load_lo = load half, half addrspace(5)* %gep_lo		%load_lo = load half, half addrspace(5)* %gep_lo
%gep_hi = getelementptr inbounds half, half addrspace(5)* null, i64 0		%gep_hi = getelementptr inbounds half, half addrspace(5)* null, i64 0
%load_hi = load half, half addrspace(5)* %gep_hi		%load_hi = load half, half addrspace(5)* %gep_hi

%temp = insertelement <2 x half> undef, half %load_lo, i32 0		%temp = insertelement <2 x half> undef, half %load_lo, i32 0
%result = insertelement <2 x half> %temp, half %load_hi, i32 1		%result = insertelement <2 x half> %temp, half %load_hi, i32 1

ret <2 x half> %result		ret <2 x half> %result
}		}

define <2 x half> @chain_hi_to_lo_private_different_bases(half addrspace(5)* %base_lo, half addrspace(5)* %base_hi) {		define <2 x half> @chain_hi_to_lo_private_different_bases(half addrspace(5)* %base_lo, half addrspace(5)* %base_hi) {
; GCN-LABEL: chain_hi_to_lo_private_different_bases:		; GCN-LABEL: chain_hi_to_lo_private_different_bases:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_ushort v0, v0, s[0:3], s33 offen		; GCN-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v0, v1, s[0:3], s33 offen		; GCN-NEXT: buffer_load_short_d16_hi v0, v1, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%load_lo = load half, half addrspace(5)* %base_lo		%load_lo = load half, half addrspace(5)* %base_lo
%load_hi = load half, half addrspace(5)* %base_hi		%load_hi = load half, half addrspace(5)* %base_hi

%temp = insertelement <2 x half> undef, half %load_lo, i32 0		%temp = insertelement <2 x half> undef, half %load_lo, i32 0
%result = insertelement <2 x half> %temp, half %load_hi, i32 1		%result = insertelement <2 x half> %temp, half %load_hi, i32 1

ret <2 x half> %result		ret <2 x half> %result
}		}

define <2 x half> @chain_hi_to_lo_arithmatic(half addrspace(5)* %base, half %in) {		define <2 x half> @chain_hi_to_lo_arithmatic(half addrspace(5)* %base, half %in) {
; GCN-LABEL: chain_hi_to_lo_arithmatic:		; GCN-LABEL: chain_hi_to_lo_arithmatic:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_add_f16_e32 v1, 1.0, v1		; GCN-NEXT: v_add_f16_e32 v1, 1.0, v1
; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], s33 offen		; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, v1		; GCN-NEXT: v_mov_b32_e32 v0, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%arith_lo = fadd half %in, 1.0		%arith_lo = fadd half %in, 1.0
%load_hi = load half, half addrspace(5)* %base		%load_hi = load half, half addrspace(5)* %base

%temp = insertelement <2 x half> undef, half %arith_lo, i32 0		%temp = insertelement <2 x half> undef, half %arith_lo, i32 0
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines

; Make sure we don't lose any of the private stores.		; Make sure we don't lose any of the private stores.
define amdgpu_kernel void @vload2_private(i16 addrspace(1)* nocapture readonly %in, <2 x i16> addrspace(1)* nocapture %out) #0 {		define amdgpu_kernel void @vload2_private(i16 addrspace(1)* nocapture readonly %in, <2 x i16> addrspace(1)* nocapture %out) #0 {
; GCN-LABEL: vload2_private:		; GCN-LABEL: vload2_private:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9		; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0		; GCN-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
		; GCN-NEXT: s_add_u32 s0, s0, s9
		; GCN-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v2, s4		; GCN-NEXT: v_mov_b32_e32 v2, s4
; GCN-NEXT: v_mov_b32_e32 v3, s5		; GCN-NEXT: v_mov_b32_e32 v3, s5
; GCN-NEXT: global_load_ushort v4, v[2:3], off		; GCN-NEXT: global_load_ushort v4, v[2:3], off
; GCN-NEXT: v_mov_b32_e32 v0, s6		; GCN-NEXT: v_mov_b32_e32 v0, s6
; GCN-NEXT: v_mov_b32_e32 v1, s7		; GCN-NEXT: v_mov_b32_e32 v1, s7
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v4, off, s[0:3], s9 offset:4		; GCN-NEXT: buffer_store_short v4, off, s[0:3], 0 offset:4
; GCN-NEXT: global_load_ushort v4, v[2:3], off offset:2		; GCN-NEXT: global_load_ushort v4, v[2:3], off offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v4, off, s[0:3], s9 offset:6		; GCN-NEXT: buffer_store_short v4, off, s[0:3], 0 offset:6
; GCN-NEXT: global_load_ushort v2, v[2:3], off offset:4		; GCN-NEXT: global_load_ushort v2, v[2:3], off offset:4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v2, off, s[0:3], s9 offset:8		; GCN-NEXT: buffer_store_short v2, off, s[0:3], 0 offset:8
; GCN-NEXT: buffer_load_ushort v2, off, s[0:3], s9 offset:4		; GCN-NEXT: buffer_load_ushort v2, off, s[0:3], 0 offset:4
; GCN-NEXT: buffer_load_ushort v4, off, s[0:3], s9 offset:6		; GCN-NEXT: buffer_load_ushort v4, off, s[0:3], 0 offset:6
; GCN-NEXT: s_waitcnt vmcnt(1)		; GCN-NEXT: s_waitcnt vmcnt(1)
; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v3, v4		; GCN-NEXT: v_mov_b32_e32 v3, v4
; GCN-NEXT: buffer_load_short_d16_hi v3, off, s[0:3], s9 offset:8		; GCN-NEXT: buffer_load_short_d16_hi v3, off, s[0:3], 0 offset:8
; GCN-NEXT: v_lshl_or_b32 v2, v4, 16, v2		; GCN-NEXT: v_lshl_or_b32 v2, v4, 16, v2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: global_store_dwordx2 v[0:1], v[2:3], off		; GCN-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
entry:		entry:
%loc = alloca [3 x i16], align 2, addrspace(5)		%loc = alloca [3 x i16], align 2, addrspace(5)
%loc.0.sroa_cast1 = bitcast [3 x i16] addrspace(5)* %loc to i8 addrspace(5)*		%loc.0.sroa_cast1 = bitcast [3 x i16] addrspace(5)* %loc to i8 addrspace(5)*
%tmp = load i16, i16 addrspace(1)* %in, align 2		%tmp = load i16, i16 addrspace(1)* %in, align 2
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	bb:
%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0		%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0
ret <2 x i16> %result		ret <2 x i16> %result
}		}

define <2 x i16> @chain_hi_to_lo_private_other_dep(i16 addrspace(5)* %ptr) {		define <2 x i16> @chain_hi_to_lo_private_other_dep(i16 addrspace(5)* %ptr) {
; GCN-LABEL: chain_hi_to_lo_private_other_dep:		; GCN-LABEL: chain_hi_to_lo_private_other_dep:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], s33 offen		; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_pk_sub_u16 v1, v1, -12 op_sel_hi:[1,0]		; GCN-NEXT: v_pk_sub_u16 v1, v1, -12 op_sel_hi:[1,0]
; GCN-NEXT: buffer_load_short_d16 v1, v0, s[0:3], s33 offen offset:2		; GCN-NEXT: buffer_load_short_d16 v1, v0, s[0:3], 0 offen offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, v1		; GCN-NEXT: v_mov_b32_e32 v0, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%gep_lo = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 1		%gep_lo = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 1
%load_lo = load i16, i16 addrspace(5)* %gep_lo		%load_lo = load i16, i16 addrspace(5)* %gep_lo
%gep_hi = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 0		%gep_hi = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 0
%load_hi = load i16, i16 addrspace(5)* %gep_hi		%load_hi = load i16, i16 addrspace(5)* %gep_hi
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

	Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}			; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
	; GCN: s_andn2_b64			; GCN: s_andn2_b64
	; GCN-NEXT: s_cbranch_execz			; GCN-NEXT: s_cbranch_execz

	; GCN: [[BB1_LOOP:BB[0-9]+_[0-9]+]]:			; GCN: [[BB1_LOOP:BB[0-9]+_[0-9]+]]:
	; GCN: s_andn2_b64 exec, exec,			; GCN: s_andn2_b64 exec, exec,
	; GCN-NEXT: s_cbranch_execnz [[BB1_LOOP]]			; GCN-NEXT: s_cbranch_execnz [[BB1_LOOP]]

	; GCN: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen			; GCN: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: s_and_b64 exec, exec, {{vcc\|s\[[0-9:]+\]}}			; GCN: s_and_b64 exec, exec, {{vcc\|s\[[0-9:]+\]}}

	; GCN-NOT: s_or_b64 exec, exec			; GCN-NOT: s_or_b64 exec, exec

	; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}			; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

	Show All 16 Lines
	; GCN: s_mov_b32 m0, -1			; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]

	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0
	; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec			; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec
	; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}, [[CMP0]]			; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}, [[CMP0]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s7 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]

	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], s7 offset:20 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], 0 offset:20 ; 4-byte Folded Spill
	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], s7 offset:24 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], 0 offset:24 ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}			; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}

	; GCN: s_cbranch_execz [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN: s_cbranch_execz [[ENDIF:BB[0-9]+_[0-9]+]]

	; GCN: ; %bb.{{[0-9]+}}: ; %if			; GCN: ; %bb.{{[0-9]+}}: ; %if
	; GCN: s_mov_b32 m0, -1			; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD1:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD1:v[0-9]+]]
	; GCN: buffer_load_dword [[RELOAD_LOAD0:v[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[RELOAD_LOAD0:v[0-9]+]], off, s[0:3], 0 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) lgkmcnt(0)


	; Spill val register			; Spill val register
	; GCN: v_add_i32_e32 [[VAL:v[0-9]+]], vcc, [[LOAD1]], [[RELOAD_LOAD0]]			; GCN: v_add_i32_e32 [[VAL:v[0-9]+]], vcc, [[LOAD1]], [[RELOAD_LOAD0]]
	; GCN: buffer_store_dword [[VAL]], off, s[0:3], s7 offset:[[VAL_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[VAL]], off, s[0:3], 0 offset:[[VAL_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; VMEM: [[ENDIF]]:			; VMEM: [[ENDIF]]:

	; Reload and restore exec mask			; Reload and restore exec mask
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]



	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:20 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], 0 offset:20 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], s7 offset:24 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], 0 offset:24 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]

	; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}

	; Restore val			; Restore val
	; GCN: buffer_load_dword [[RELOAD_VAL:v[0-9]+]], off, s[0:3], s7 offset:[[VAL_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[RELOAD_VAL:v[0-9]+]], off, s[0:3], 0 offset:[[VAL_OFFSET]] ; 4-byte Folded Reload

	; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RELOAD_VAL]]			; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RELOAD_VAL]]
	define amdgpu_kernel void @divergent_if_endif(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @divergent_if_endif(i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%load0 = load volatile i32, i32 addrspace(3)* undef			%load0 = load volatile i32, i32 addrspace(3)* undef
	%cmp0 = icmp eq i32 %tid, 0			%cmp0 = icmp eq i32 %tid, 0
	br i1 %cmp0, label %if, label %endif			br i1 %cmp0, label %if, label %endif
	Show All 18 Lines
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]

	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0

	; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec			; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec
	; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, [[CMP0]]			; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, [[CMP0]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s7 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]


	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], s7 offset:24 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], 0 offset:24 ; 4-byte Folded Spill
	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], s7 offset:28 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], 0 offset:28 ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}			; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}

	; GCN-NEXT: s_cbranch_execz [[END:BB[0-9]+_[0-9]+]]			; GCN-NEXT: s_cbranch_execz [[END:BB[0-9]+_[0-9]+]]


	; GCN: [[LOOP:BB[0-9]+_[0-9]+]]:			; GCN: [[LOOP:BB[0-9]+_[0-9]+]]:
	; GCN: buffer_load_dword v[[VAL_LOOP_RELOAD:[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[VAL_LOOP_RELOAD:[0-9]+]], off, s[0:3], 0 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: v_subrev_i32_e32 [[VAL_LOOP:v[0-9]+]], vcc, v{{[0-9]+}}, v[[VAL_LOOP_RELOAD]]			; GCN: v_subrev_i32_e32 [[VAL_LOOP:v[0-9]+]], vcc, v{{[0-9]+}}, v[[VAL_LOOP_RELOAD]]
	; GCN: s_cmp_lg_u32			; GCN: s_cmp_lg_u32
	; GCN: buffer_store_dword [[VAL_LOOP]], off, s[0:3], s7 offset:[[VAL_SUB_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[VAL_LOOP]], off, s[0:3], 0 offset:[[VAL_SUB_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN-NEXT: s_cbranch_scc1 [[LOOP]]			; GCN-NEXT: s_cbranch_scc1 [[LOOP]]


	; GCN: [[END]]:			; GCN: [[END]]:
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:24 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], 0 offset:24 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], s7 offset:28 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], 0 offset:28 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]

	; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}
	; GCN: buffer_load_dword v[[VAL_END:[0-9]+]], off, s[0:3], s7 offset:[[VAL_SUB_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[VAL_END:[0-9]+]], off, s[0:3], 0 offset:[[VAL_SUB_OFFSET]] ; 4-byte Folded Reload

	; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v[[VAL_END]]			; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v[[VAL_END]]
	define amdgpu_kernel void @divergent_loop(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @divergent_loop(i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%load0 = load volatile i32, i32 addrspace(3)* undef			%load0 = load volatile i32, i32 addrspace(3)* undef
	%cmp0 = icmp eq i32 %tid, 0			%cmp0 = icmp eq i32 %tid, 0
	br i1 %cmp0, label %loop, label %end			br i1 %cmp0, label %loop, label %end
	Show All 22 Lines
	; GCN: s_mov_b32 [[ZERO:s[0-9]+]], 0			; GCN: s_mov_b32 [[ZERO:s[0-9]+]], 0
	; GCN: v_cmp_ne_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], [[ZERO]], v0			; GCN: v_cmp_ne_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], [[ZERO]], v0

	; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec			; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec
	; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, [[CMP0]]			; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, [[CMP0]]
	; GCN: s_xor_b64 s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}, s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}			; GCN: s_xor_b64 s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}, s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s7 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]

	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], s7 offset:[[SAVEEXEC_LO_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], 0 offset:[[SAVEEXEC_LO_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], s7 offset:[[SAVEEXEC_HI_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], 0 offset:[[SAVEEXEC_HI_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, [[CMP0]]			; GCN: s_mov_b64 exec, [[CMP0]]

	; FIXME: It makes no sense to put this skip here			; FIXME: It makes no sense to put this skip here
	; GCN: s_cbranch_execz [[FLOW:BB[0-9]+_[0-9]+]]			; GCN: s_cbranch_execz [[FLOW:BB[0-9]+_[0-9]+]]
	; GCN-NEXT: s_branch [[ELSE:BB[0-9]+_[0-9]+]]			; GCN-NEXT: s_branch [[ELSE:BB[0-9]+_[0-9]+]]

	; GCN: [[FLOW]]: ; %Flow			; GCN: [[FLOW]]: ; %Flow
	; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]


	; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:[[SAVEEXEC_LO_OFFSET]]			; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], 0 offset:[[SAVEEXEC_LO_OFFSET]]
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC_LO]]			; VMEM: v_readfirstlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC_LO]]

	; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], s7 offset:[[SAVEEXEC_HI_OFFSET]] ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], 0 offset:[[SAVEEXEC_HI_OFFSET]] ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC_HI]]			; VMEM: v_readfirstlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC_HI]]

	; GCN: s_or_saveexec_b64 s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}, s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_or_saveexec_b64 s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}, s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}

	; Regular spill value restored after exec modification			; Regular spill value restored after exec modification
	; GCN: buffer_load_dword [[FLOW_VAL:v[0-9]+]], off, s[0:3], s7 offset:[[FLOW_VAL_OFFSET:[0-9]+]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[FLOW_VAL:v[0-9]+]], off, s[0:3], 0 offset:[[FLOW_VAL_OFFSET:[0-9]+]] ; 4-byte Folded Reload


	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[FLOW_S_RELOAD_SAVEEXEC_LO]], [[FLOW_SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[FLOW_S_RELOAD_SAVEEXEC_LO]], [[FLOW_SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[FLOW_S_RELOAD_SAVEEXEC_HI]], [[FLOW_SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[FLOW_S_RELOAD_SAVEEXEC_HI]], [[FLOW_SAVEEXEC_HI_LANE:[0-9]+]]


	; VMEM: v_mov_b32_e32 v[[FLOW_V_SAVEEXEC_LO:[0-9]+]], s[[FLOW_S_RELOAD_SAVEEXEC_LO]]			; VMEM: v_mov_b32_e32 v[[FLOW_V_SAVEEXEC_LO:[0-9]+]], s[[FLOW_S_RELOAD_SAVEEXEC_LO]]
	; VMEM: buffer_store_dword v[[FLOW_V_SAVEEXEC_LO]], off, s[0:3], s7 offset:[[FLOW_SAVEEXEC_LO_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[FLOW_V_SAVEEXEC_LO]], off, s[0:3], 0 offset:[[FLOW_SAVEEXEC_LO_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; VMEM: v_mov_b32_e32 v[[FLOW_V_SAVEEXEC_HI:[0-9]+]], s[[FLOW_S_RELOAD_SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[FLOW_V_SAVEEXEC_HI:[0-9]+]], s[[FLOW_S_RELOAD_SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[FLOW_V_SAVEEXEC_HI]], off, s[0:3], s7 offset:[[FLOW_SAVEEXEC_HI_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[FLOW_V_SAVEEXEC_HI]], off, s[0:3], 0 offset:[[FLOW_SAVEEXEC_HI_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; GCN: buffer_store_dword [[FLOW_VAL]], off, s[0:3], s7 offset:[[RESULT_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[FLOW_VAL]], off, s[0:3], 0 offset:[[RESULT_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN: s_xor_b64 exec, exec, s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_xor_b64 exec, exec, s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}
	; GCN-NEXT: s_cbranch_execz [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN-NEXT: s_cbranch_execz [[ENDIF:BB[0-9]+_[0-9]+]]


	; GCN: ; %bb.{{[0-9]+}}: ; %if			; GCN: ; %bb.{{[0-9]+}}: ; %if
	; GCN: ds_read_b32			; GCN: ds_read_b32
	; GCN: buffer_load_dword v[[LOAD0_RELOAD:[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[LOAD0_RELOAD:[0-9]+]], off, s[0:3], 0 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: v_add_i32_e32 [[ADD:v[0-9]+]], vcc, v{{[0-9]+}}, v[[LOAD0_RELOAD]]			; GCN: v_add_i32_e32 [[ADD:v[0-9]+]], vcc, v{{[0-9]+}}, v[[LOAD0_RELOAD]]
	; GCN: buffer_store_dword [[ADD]], off, s[0:3], s7 offset:[[RESULT_OFFSET]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[ADD]], off, s[0:3], 0 offset:[[RESULT_OFFSET]] ; 4-byte Folded Spill
	; GCN-NEXT: s_branch [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN-NEXT: s_branch [[ENDIF:BB[0-9]+_[0-9]+]]

	; GCN: [[ELSE]]: ; %else			; GCN: [[ELSE]]: ; %else
	; GCN: buffer_load_dword v[[LOAD0_RELOAD:[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[LOAD0_RELOAD:[0-9]+]], off, s[0:3], 0 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: v_subrev_i32_e32 [[SUB:v[0-9]+]], vcc, v{{[0-9]+}}, v[[LOAD0_RELOAD]]			; GCN: v_subrev_i32_e32 [[SUB:v[0-9]+]], vcc, v{{[0-9]+}}, v[[LOAD0_RELOAD]]
	; GCN: buffer_store_dword [[ADD]], off, s[0:3], s7 offset:[[FLOW_RESULT_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[ADD]], off, s[0:3], 0 offset:[[FLOW_RESULT_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN-NEXT: s_branch [[FLOW]]			; GCN-NEXT: s_branch [[FLOW]]

	; GCN: [[ENDIF]]:			; GCN: [[ENDIF]]:
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[FLOW_SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[FLOW_SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[FLOW_SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[FLOW_SAVEEXEC_HI_LANE]]


	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:[[FLOW_SAVEEXEC_LO_OFFSET]] ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], 0 offset:[[FLOW_SAVEEXEC_LO_OFFSET]] ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], s7 offset:[[FLOW_SAVEEXEC_HI_OFFSET]] ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], 0 offset:[[FLOW_SAVEEXEC_HI_OFFSET]] ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]

	; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}

	; GCN: buffer_load_dword v[[RESULT:[0-9]+]], off, s[0:3], s7 offset:[[RESULT_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[RESULT:[0-9]+]], off, s[0:3], 0 offset:[[RESULT_OFFSET]] ; 4-byte Folded Reload
	; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v[[RESULT]]			; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v[[RESULT]]
	define amdgpu_kernel void @divergent_if_else_endif(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @divergent_if_else_endif(i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%load0 = load volatile i32, i32 addrspace(3)* undef			%load0 = load volatile i32, i32 addrspace(3)* undef
	%cmp0 = icmp eq i32 %tid, 0			%cmp0 = icmp eq i32 %tid, 0
	br i1 %cmp0, label %if, label %else			br i1 %cmp0, label %if, label %else

	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	bb1:
%ins1 = insertvalue { i32, half } %ins0, half %extract1, 1		%ins1 = insertvalue { i32, half } %ins0, half %extract1, 1
ret { i32, half } %ins1		ret { i32, half } %ins1
}		}

define amdgpu_kernel void @v3i16_registers(i1 %cond) #0 {		define amdgpu_kernel void @v3i16_registers(i1 %cond) #0 {
; GCN-LABEL: v3i16_registers:		; GCN-LABEL: v3i16_registers:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_load_dword s4, s[4:5], 0x0		; GCN-NEXT: s_load_dword s4, s[4:5], 0x0
; GCN-NEXT: s_mov_b32 s33, s9		; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_add_u32 s0, s0, s9
		; GCN-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_and_b32 s4, 1, s4		; GCN-NEXT: s_and_b32 s4, 1, s4
; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1		; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1
; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]		; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]
		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_cbranch_vccz BB4_2		; GCN-NEXT: s_cbranch_vccz BB4_2
; GCN-NEXT: ; %bb.1:		; GCN-NEXT: ; %bb.1:
; GCN-NEXT: s_mov_b32 s4, 0		; GCN-NEXT: s_mov_b32 s4, 0
; GCN-NEXT: s_mov_b32 s5, s4		; GCN-NEXT: s_mov_b32 s5, s4
; GCN-NEXT: v_mov_b32_e32 v0, s4		; GCN-NEXT: v_mov_b32_e32 v0, s4
; GCN-NEXT: v_mov_b32_e32 v1, s5		; GCN-NEXT: v_mov_b32_e32 v1, s5
; GCN-NEXT: s_branch BB4_3		; GCN-NEXT: s_branch BB4_3
; GCN-NEXT: BB4_2: ; %if.else		; GCN-NEXT: BB4_2: ; %if.else
Show All 20 Lines	if.end: ; preds = %if.else, %if.then
store <3 x i16> %call6.sink, <3 x i16> addrspace(1)* undef		store <3 x i16> %call6.sink, <3 x i16> addrspace(1)* undef
ret void		ret void
}		}

define amdgpu_kernel void @v3f16_registers(i1 %cond) #0 {		define amdgpu_kernel void @v3f16_registers(i1 %cond) #0 {
; GCN-LABEL: v3f16_registers:		; GCN-LABEL: v3f16_registers:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_load_dword s4, s[4:5], 0x0		; GCN-NEXT: s_load_dword s4, s[4:5], 0x0
; GCN-NEXT: s_mov_b32 s33, s9		; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_add_u32 s0, s0, s9
		; GCN-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_and_b32 s4, 1, s4		; GCN-NEXT: s_and_b32 s4, 1, s4
; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1		; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1
; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]		; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]
		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_cbranch_vccz BB5_2		; GCN-NEXT: s_cbranch_vccz BB5_2
; GCN-NEXT: ; %bb.1:		; GCN-NEXT: ; %bb.1:
; GCN-NEXT: s_mov_b32 s4, 0		; GCN-NEXT: s_mov_b32 s4, 0
; GCN-NEXT: s_mov_b32 s5, s4		; GCN-NEXT: s_mov_b32 s5, s4
; GCN-NEXT: v_mov_b32_e32 v0, s4		; GCN-NEXT: v_mov_b32_e32 v0, s4
; GCN-NEXT: v_mov_b32_e32 v1, s5		; GCN-NEXT: v_mov_b32_e32 v1, s5
; GCN-NEXT: s_branch BB5_3		; GCN-NEXT: s_branch BB5_3
; GCN-NEXT: BB5_2: ; %if.else		; GCN-NEXT: BB5_2: ; %if.else
Show All 34 Lines

llvm/test/CodeGen/AMDGPU/extload-private.ll

	; RUN: llc -march=amdgcn -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}load_i8_sext_private:			; FUNC-LABEL: {{^}}load_i8_sext_private:
	; SI: buffer_load_sbyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4{{$}}			; SI: buffer_load_sbyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i8_sext_private(i32 addrspace(1)* %out) {			define amdgpu_kernel void @load_i8_sext_private(i32 addrspace(1)* %out) {
	entry:			entry:
	%tmp0 = alloca i8, addrspace(5)			%tmp0 = alloca i8, addrspace(5)
	%tmp1 = load i8, i8 addrspace(5)* %tmp0			%tmp1 = load i8, i8 addrspace(5)* %tmp0
	%tmp2 = sext i8 %tmp1 to i32			%tmp2 = sext i8 %tmp1 to i32
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}load_i8_zext_private:			; FUNC-LABEL: {{^}}load_i8_zext_private:
	; SI: buffer_load_ubyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4{{$}}			; SI: buffer_load_ubyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i8_zext_private(i32 addrspace(1)* %out) {			define amdgpu_kernel void @load_i8_zext_private(i32 addrspace(1)* %out) {
	entry:			entry:
	%tmp0 = alloca i8, addrspace(5)			%tmp0 = alloca i8, addrspace(5)
	%tmp1 = load i8, i8 addrspace(5)* %tmp0			%tmp1 = load i8, i8 addrspace(5)* %tmp0
	%tmp2 = zext i8 %tmp1 to i32			%tmp2 = zext i8 %tmp1 to i32
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}load_i16_sext_private:			; FUNC-LABEL: {{^}}load_i16_sext_private:
	; SI: buffer_load_sshort v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4{{$}}			; SI: buffer_load_sshort v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i16_sext_private(i32 addrspace(1)* %out) {			define amdgpu_kernel void @load_i16_sext_private(i32 addrspace(1)* %out) {
	entry:			entry:
	%tmp0 = alloca i16, addrspace(5)			%tmp0 = alloca i16, addrspace(5)
	%tmp1 = load i16, i16 addrspace(5)* %tmp0			%tmp1 = load i16, i16 addrspace(5)* %tmp0
	%tmp2 = sext i16 %tmp1 to i32			%tmp2 = sext i16 %tmp1 to i32
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}load_i16_zext_private:			; FUNC-LABEL: {{^}}load_i16_zext_private:
	; SI: buffer_load_ushort v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4{{$}}			; SI: buffer_load_ushort v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i16_zext_private(i32 addrspace(1)* %out) {			define amdgpu_kernel void @load_i16_zext_private(i32 addrspace(1)* %out) {
	entry:			entry:
	%tmp0 = alloca i16, addrspace(5)			%tmp0 = alloca i16, addrspace(5)
	%tmp1 = load volatile i16, i16 addrspace(5)* %tmp0			%tmp1 = load volatile i16, i16 addrspace(5)* %tmp0
	%tmp2 = zext i16 %tmp1 to i32			%tmp2 = zext i16 %tmp1 to i32
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=-unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-ALIGNED %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=-unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-ALIGNED %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-UNALIGNED %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-UNALIGNED %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX9 %s

	; Should not merge this to a dword load			; Should not merge this to a dword load
	define i32 @private_load_2xi16_align2(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align2(i16 addrspace(5)* %p) #0 {
	; GFX7-ALIGNED-LABEL: private_load_2xi16_align2:			; GFX7-ALIGNED-LABEL: private_load_2xi16_align2:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_load_ushort v1, v1, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_ushort v1, v1, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v0, v1			; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v0, v1
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_load_2xi16_align2:			; GFX7-UNALIGNED-LABEL: private_load_2xi16_align2:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0			; GFX7-UNALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0
	; GFX7-UNALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: buffer_load_ushort v1, v1, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_load_ushort v1, v1, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; GFX7-UNALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX7-UNALIGNED-NEXT: v_or_b32_e32 v0, v0, v1			; GFX7-UNALIGNED-NEXT: v_or_b32_e32 v0, v0, v1
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_load_2xi16_align2:			; GFX9-LABEL: private_load_2xi16_align2:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_ushort v1, v0, s[0:3], s33 offen			; GFX9-NEXT: buffer_load_ushort v1, v0, s[0:3], 0 offen
	; GFX9-NEXT: buffer_load_ushort v0, v0, s[0:3], s33 offen offset:2			; GFX9-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshl_or_b32 v0, v0, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v0, v0, 16, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 2			%p.0 = load i16, i16 addrspace(5)* %p, align 2
	%p.1 = load i16, i16 addrspace(5)* %gep.p, align 2			%p.1 = load i16, i16 addrspace(5)* %gep.p, align 2
	%zext.0 = zext i16 %p.0 to i32			%zext.0 = zext i16 %p.0 to i32
	%zext.1 = zext i16 %p.1 to i32			%zext.1 = zext i16 %p.1 to i32
	%shl.1 = shl i32 %zext.1, 16			%shl.1 = shl i32 %zext.1, 16
	%or = or i32 %zext.0, %shl.1			%or = or i32 %zext.0, %shl.1
	ret i32 %or			ret i32 %or
	}			}

	; Should not merge this to a dword store			; Should not merge this to a dword store
	define void @private_store_2xi16_align2(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {			define void @private_store_2xi16_align2(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {
	; GFX7-ALIGNED-LABEL: private_store_2xi16_align2:			; GFX7-ALIGNED-LABEL: private_store_2xi16_align2:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v3, 1			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v3, 1
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 2			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 2
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1
	; GFX7-ALIGNED-NEXT: buffer_store_short v3, v1, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_store_short v3, v1, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_store_short v0, v2, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_store_2xi16_align2:			; GFX7-UNALIGNED-LABEL: private_store_2xi16_align2:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v3, 1			; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v3, 1
	; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 2			; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 2
	; GFX7-UNALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1			; GFX7-UNALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1
	; GFX7-UNALIGNED-NEXT: buffer_store_short v3, v1, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_store_short v3, v1, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: buffer_store_short v0, v2, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_store_2xi16_align2:			; GFX9-LABEL: private_store_2xi16_align2:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], s33 offen			; GFX9-NEXT: v_mov_b32_e32 v2, 2
	; GFX9-NEXT: v_mov_b32_e32 v0, 2			; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], s33 offen offset:2			; GFX9-NEXT: buffer_store_short v2, v1, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 2			store i16 1, i16 addrspace(5)* %r, align 2
	store i16 2, i16 addrspace(5)* %gep.r, align 2			store i16 2, i16 addrspace(5)* %gep.r, align 2
	ret void			ret void
	}			}

	; Should produce align 1 dword when legal			; Should produce align 1 dword when legal
	define i32 @private_load_2xi16_align1(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align1(i16 addrspace(5)* %p) #0 {
	; GFX7-ALIGNED-LABEL: private_load_2xi16_align1:			; GFX7-ALIGNED-LABEL: private_load_2xi16_align1:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 1, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ubyte v2, v2, s[0:3], s33 offen
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v3, vcc, 3, v0			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 1, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ubyte v3, v3, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_ubyte v3, v0, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_load_ubyte v1, v1, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v0, vcc, 3, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ubyte v0, v0, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_ubyte v0, v0, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(3)			; GFX7-ALIGNED-NEXT: buffer_load_ubyte v2, v2, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v2, 8, v2			; GFX7-ALIGNED-NEXT: buffer_load_ubyte v1, v1, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(2)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(2)
	; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v3, 8, v3			; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v0, 8, v0
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(1)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(1)
	; GFX7-ALIGNED-NEXT: v_or_b32_e32 v1, v3, v1			; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v2, 8, v2
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v2, v0
	; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v0, v1			; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v0, v1
				; GFX7-ALIGNED-NEXT: v_or_b32_e32 v2, v2, v3
				; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v0, 16, v0
				; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v2, v0
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_load_2xi16_align1:			; GFX7-UNALIGNED-LABEL: private_load_2xi16_align1:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_load_2xi16_align1:			; GFX9-LABEL: private_load_2xi16_align1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff
	; GFX9-NEXT: s_mov_b32 s4, 0xffff			; GFX9-NEXT: s_mov_b32 s4, 0xffff
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0			; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0
	; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 1			%p.0 = load i16, i16 addrspace(5)* %p, align 1
	%p.1 = load i16, i16 addrspace(5)* %gep.p, align 1			%p.1 = load i16, i16 addrspace(5)* %gep.p, align 1
	%zext.0 = zext i16 %p.0 to i32			%zext.0 = zext i16 %p.0 to i32
	%zext.1 = zext i16 %p.1 to i32			%zext.1 = zext i16 %p.1 to i32
	%shl.1 = shl i32 %zext.1, 16			%shl.1 = shl i32 %zext.1, 16
	%or = or i32 %zext.0, %shl.1			%or = or i32 %zext.0, %shl.1
	ret i32 %or			ret i32 %or
	}			}

	; Should produce align 1 dword when legal			; Should produce align 1 dword when legal
	define void @private_store_2xi16_align1(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {			define void @private_store_2xi16_align1(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {
	; GFX7-ALIGNED-LABEL: private_store_2xi16_align1:			; GFX7-ALIGNED-LABEL: private_store_2xi16_align1:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v3, 1			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v3, 1
	; GFX7-ALIGNED-NEXT: buffer_store_byte v3, v1, s[0:3], s33 offen
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v3, vcc, 1, v1			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v4, vcc, 1, v1
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v4, 0			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v5, 0
				; GFX7-ALIGNED-NEXT: buffer_store_byte v3, v1, s[0:3], 0 offen
				; GFX7-ALIGNED-NEXT: buffer_store_byte v5, v4, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 3, v1			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 3, v1
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 2			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 2
	; GFX7-ALIGNED-NEXT: buffer_store_byte v4, v3, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_store_byte v5, v1, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_store_byte v4, v1, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_store_byte v0, v2, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_store_byte v0, v2, s[0:3], s33 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_store_2xi16_align1:			; GFX7-UNALIGNED-LABEL: private_store_2xi16_align1:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX7-UNALIGNED-NEXT: buffer_store_dword v0, v1, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_store_2xi16_align1:			; GFX9-LABEL: private_store_2xi16_align1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], s33 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 1			store i16 1, i16 addrspace(5)* %r, align 1
	store i16 2, i16 addrspace(5)* %gep.r, align 1			store i16 2, i16 addrspace(5)* %gep.r, align 1
	ret void			ret void
	}			}

	; Should merge this to a dword load			; Should merge this to a dword load
	define i32 @private_load_2xi16_align4(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align4(i16 addrspace(5)* %p) #0 {
	; GFX7-LABEL: load_2xi16_align4:			; GFX7-LABEL: load_2xi16_align4:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: flat_load_dword v0, v[0:1]			; GFX7-NEXT: flat_load_dword v0, v[0:1]
	; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-ALIGNED-LABEL: private_load_2xi16_align4:			; GFX7-ALIGNED-LABEL: private_load_2xi16_align4:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_load_2xi16_align4:			; GFX7-UNALIGNED-LABEL: private_load_2xi16_align4:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_load_2xi16_align4:			; GFX9-LABEL: private_load_2xi16_align4:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff
	; GFX9-NEXT: s_mov_b32 s4, 0xffff			; GFX9-NEXT: s_mov_b32 s4, 0xffff
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0			; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0
	; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 4			%p.0 = load i16, i16 addrspace(5)* %p, align 4
	Show All 16 Lines
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: flat_store_dword v[0:1], v2			; GFX7-NEXT: flat_store_dword v[0:1], v2
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GCN-LABEL: private_store_2xi16_align4:			; GCN-LABEL: private_store_2xi16_align4:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, 0x20001			; GCN-NEXT: v_mov_b32_e32 v0, 0x20001
	; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], s33 offen			; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 4			store i16 1, i16 addrspace(5)* %r, align 4
	store i16 2, i16 addrspace(5)* %gep.r, align 2			store i16 2, i16 addrspace(5)* %gep.r, align 2
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/fold-fi-mubuf.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass si-fold-operands,dead-mi-elimination %s -o - \| FileCheck -check-prefix=GCN %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass si-fold-operands,dead-mi-elimination %s -o - \| FileCheck -check-prefix=GCN %s

				# Kernels have no FP
	---			---
	name: no_fold_fi_non_stack_rsrc_soffset			name: kernel_no_fold_fi_non_stack_rsrc_and_soffset
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	localFrameSize: 4			localFrameSize: 4
	stack:			stack:
	- { id: 0, size: 4, alignment: 4, local-offset: 0 }			- { id: 0, size: 4, alignment: 4, local-offset: 0 }
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr6'
	frameOffsetReg: '$sgpr6'
	stackPtrOffsetReg: '$sgpr6'
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr12_sgpr13_sgpr14_sgpr15			liveins: $sgpr12_sgpr13_sgpr14_sgpr15

	; GCN-LABEL: name: no_fold_fi_non_stack_rsrc_soffset			; GCN-LABEL: name: kernel_no_fold_fi_non_stack_rsrc_and_soffset
	; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15			; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
	; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15			; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]			; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
	; GCN: SI_RETURN_TO_EPILOG $vgpr0			; GCN: SI_RETURN_TO_EPILOG $vgpr0
	%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15			%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
	%1:sreg_32_xm0 = S_MOV_B32 0			%1:sreg_32_xm0 = S_MOV_B32 0
	%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, %1, 0, 0, 0, 0, 0, 0, implicit $exec			%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, %1, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr0 = COPY %3			$vgpr0 = COPY %3
	SI_RETURN_TO_EPILOG $vgpr0			SI_RETURN_TO_EPILOG $vgpr0

	...			...

	---			---
	name: no_fold_fi_non_stack_rsrc			name: kernel_no_fold_fi_non_stack_rsrc
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	localFrameSize: 4			localFrameSize: 4
	stack:			stack:
	- { id: 0, size: 4, alignment: 4, local-offset: 0 }			- { id: 0, size: 4, alignment: 4, local-offset: 0 }
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr6'			body: \|
	frameOffsetReg: '$sgpr6'			bb.0:
				liveins: $sgpr12_sgpr13_sgpr14_sgpr15

				; GCN-LABEL: name: kernel_no_fold_fi_non_stack_rsrc
				; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
				; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
				; GCN: SI_RETURN_TO_EPILOG $vgpr0
				%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
				%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %3
				SI_RETURN_TO_EPILOG $vgpr0

				...

				---
				name: kernel_no_fold_fi_non_stack_soffset
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				body: \|
				bb.0:

				; GCN-LABEL: name: kernel_no_fold_fi_non_stack_soffset
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				; GCN: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
				; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_1]], [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
				; GCN: S_ENDPGM 0, implicit $vgpr0
				%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
				%2:sreg_32_xm0 = S_MOV_B32 0

				BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, %2, 0, 0, 0, 0, 0, 0, implicit $exec
				%3:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, %2, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %3
				S_ENDPGM 0, implicit $vgpr0

				...

				---
				name: kernel_fold_fi_mubuf
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				body: \|
				bb.0:

				; GCN-LABEL: name: kernel_fold_fi_mubuf
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
				; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
				; GCN: S_ENDPGM 0, implicit $vgpr0
				%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

				BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %2
				S_ENDPGM 0, implicit $vgpr0

				...


				# Functions have an unswizzled SP/FP relative to the wave offset
				---
				name: function_no_fold_fi_non_stack_rsrc_and_soffset
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
				frameOffsetReg: '$sgpr32'
				stackPtrOffsetReg: '$sgpr32'
				body: \|
				bb.0:
				liveins: $sgpr12_sgpr13_sgpr14_sgpr15

				; GCN-LABEL: name: function_no_fold_fi_non_stack_rsrc_and_soffset
				; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
				; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
				; GCN: SI_RETURN_TO_EPILOG $vgpr0
				%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
				%1:sreg_32_xm0 = S_MOV_B32 0
				%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, %1, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %3
				SI_RETURN_TO_EPILOG $vgpr0

				...

				---
				name: function_no_fold_fi_non_stack_rsrc
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
				frameOffsetReg: '$sgpr32'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr12_sgpr13_sgpr14_sgpr15			liveins: $sgpr12_sgpr13_sgpr14_sgpr15

	; GCN-LABEL: name: no_fold_fi_non_stack_rsrc			; GCN-LABEL: name: function_no_fold_fi_non_stack_rsrc
	; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15			; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
	; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15			; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]			; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
	; GCN: SI_RETURN_TO_EPILOG $vgpr0			; GCN: SI_RETURN_TO_EPILOG $vgpr0
	%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15			%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
	%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr0 = COPY %3			$vgpr0 = COPY %3
	SI_RETURN_TO_EPILOG $vgpr0			SI_RETURN_TO_EPILOG $vgpr0

	...			...

	# Offset is from global scratch wave offset.
	---			---
	name: fold_fi_mubuf_scratch_scratch_wave_offset			name: function_no_fold_fi_non_stack_soffset
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	localFrameSize: 4			localFrameSize: 4
	stack:			stack:
	- { id: 0, size: 4, alignment: 4, local-offset: 0 }			- { id: 0, size: 4, alignment: 4, local-offset: 0 }
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: false
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr33'			frameOffsetReg: '$sgpr32'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:

	; GCN-LABEL: name: fold_fi_mubuf_scratch_scratch_wave_offset			; GCN-LABEL: name: function_no_fold_fi_non_stack_soffset
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
	; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]			; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
	; GCN: S_ENDPGM 0, implicit $vgpr0			; GCN: S_ENDPGM 0, implicit $vgpr0
	%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec			%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

	BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, 0, 0, implicit $exec			BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
	%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, 0, 0, implicit $exec			%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr0 = COPY %2			$vgpr0 = COPY %2
	S_ENDPGM 0, implicit $vgpr0			S_ENDPGM 0, implicit $vgpr0

	...			...

	---			---
	name: no_fold_fi_mubuf_scratch_sp_offset			name: function_fold_fi_mubuf_wave_relative
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	localFrameSize: 4			localFrameSize: 4
	stack:			stack:
	- { id: 0, size: 4, alignment: 4, local-offset: 0 }			- { id: 0, size: 4, alignment: 4, local-offset: 0 }
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				frameOffsetReg: '$sgpr32'
				stackPtrOffsetReg: '$sgpr32'
				body: \|
				bb.0:

				; GCN-LABEL: name: function_fold_fi_mubuf_wave_relative
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
				; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
				; GCN: S_ENDPGM 0, implicit $vgpr0
				%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

				BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %2
				S_ENDPGM 0, implicit $vgpr0

				...

				---
				name: function_fold_fi_mubuf_stack_relative
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr33'			frameOffsetReg: '$sgpr32'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:

	; GCN-LABEL: name: no_fold_fi_mubuf_scratch_sp_offset			; GCN-LABEL: name: function_fold_fi_mubuf_stack_relative
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
	; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]			; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
	; GCN: S_ENDPGM 0, implicit $vgpr0			; GCN: S_ENDPGM 0, implicit $vgpr0
	%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec			%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

	BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr0 = COPY %2			$vgpr0 = COPY %2
	S_ENDPGM 0, implicit $vgpr0			S_ENDPGM 0, implicit $vgpr0

	...			...

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

; Test that non-entry function frame indices are expanded properly to		; Test that non-entry function frame indices are expanded properly to
; give an index relative to the scratch wave offset register		; give an index relative to the scratch wave offset register

; Materialize into a mov. Make sure there isn't an unnecessary copy.		; Materialize into a mov. Make sure there isn't an unnecessary copy.
; GCN-LABEL: {{^}}func_mov_fi_i32:		; GCN-LABEL: {{^}}func_mov_fi_i32:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN: s_sub_u32 [[SUB:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI-NEXT: v_lshr_b32_e64 v0, [[SUB]], 6		; CI-NEXT: v_lshr_b32_e64 v0, s32, 6
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, [[SUB]]		; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s32

; GCN-NOT: v_mov		; GCN-NOT: v_mov
; GCN: ds_write_b32 v0, v0		; GCN: ds_write_b32 v0, v0
define void @func_mov_fi_i32() #0 {		define void @func_mov_fi_i32() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 addrspace(5)* %alloca, i32 addrspace(5)* addrspace(3)* undef		store volatile i32 addrspace(5)* %alloca, i32 addrspace(5)* addrspace(3)* undef
ret void		ret void
}		}

; Offset due to different objects		; Offset due to different objects
; GCN-LABEL: {{^}}func_mov_fi_i32_offset:		; GCN-LABEL: {{^}}func_mov_fi_i32_offset:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

; CI: s_sub_u32 [[SUB0:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33		; CI-DAG: v_lshr_b32_e64 v0, s32, 6
; CI-NEXT: s_sub_u32 [[SUB1:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33		; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
; CI-DAG: v_lshr_b32_e64 v0, [[SUB0]], 6
; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], [[SUB1]], 6
; CI-NOT: v_mov		; CI-NOT: v_mov
; CI: ds_write_b32 v0, v0		; CI: ds_write_b32 v0, v0
; CI-NEXT: v_add_i32_e64 v0, s{{\[[0-9]+:[0-9]+\]}}, 4, [[SCALED]]		; CI-NEXT: v_add_i32_e{{32\|64}} v0, {{s\[[0-9]+:[0-9]+\]\|vcc}}, 4, [[SCALED]]
; CI-NEXT: ds_write_b32 v0, v0		; CI-NEXT: ds_write_b32 v0, v0

; GFX9: s_sub_u32 [[SUB0:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33		; GFX9: v_lshrrev_b32_e64 v0, 6, s32
; GFX9-NEXT: s_sub_u32 [[SUB1:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33		; GFX9-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, [[SUB0]]
; GFX9-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, [[SUB1]]
; GFX9-DAG: ds_write_b32 v0, v0		; GFX9-DAG: ds_write_b32 v0, v0
; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]		; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]
; GFX9-NEXT: ds_write_b32 v0, v0		; GFX9-NEXT: ds_write_b32 v0, v0
define void @func_mov_fi_i32_offset() #0 {		define void @func_mov_fi_i32_offset() #0 {
%alloca0 = alloca i32, addrspace(5)		%alloca0 = alloca i32, addrspace(5)
%alloca1 = alloca i32, addrspace(5)		%alloca1 = alloca i32, addrspace(5)
store volatile i32 addrspace(5)* %alloca0, i32 addrspace(5)* addrspace(3)* undef		store volatile i32 addrspace(5)* %alloca0, i32 addrspace(5)* addrspace(3)* undef
store volatile i32 addrspace(5)* %alloca1, i32 addrspace(5)* addrspace(3)* undef		store volatile i32 addrspace(5)* %alloca1, i32 addrspace(5)* addrspace(3)* undef
ret void		ret void
}		}

; Materialize into an add of a constant offset from the FI.		; Materialize into an add of a constant offset from the FI.
; FIXME: Should be able to merge adds		; FIXME: Should be able to merge adds

; GCN-LABEL: {{^}}func_add_constant_to_fi_i32:		; GCN-LABEL: {{^}}func_add_constant_to_fi_i32:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN: s_sub_u32 [[SUB:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI-NEXT: v_lshr_b32_e64 [[SCALED:v[0-9]+]], [[SUB]], 6		; CI: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
; CI-NEXT: v_add_i32_e32 v0, vcc, 4, [[SCALED]]		; CI-NEXT: v_add_i32_e32 v0, vcc, 4, [[SCALED]]

; GFX9-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, [[SUB]]		; GFX9: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]		; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]


; GCN-NOT: v_mov		; GCN-NOT: v_mov
; GCN: ds_write_b32 v0, v0		; GCN: ds_write_b32 v0, v0
define void @func_add_constant_to_fi_i32() #0 {		define void @func_add_constant_to_fi_i32() #0 {
%alloca = alloca [2 x i32], align 4, addrspace(5)		%alloca = alloca [2 x i32], align 4, addrspace(5)
%gep0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(5)* %alloca, i32 0, i32 1		%gep0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(5)* %alloca, i32 0, i32 1
store volatile i32 addrspace(5)* %gep0, i32 addrspace(5)* addrspace(3)* undef		store volatile i32 addrspace(5)* %gep0, i32 addrspace(5)* addrspace(3)* undef
ret void		ret void
}		}

; A user the materialized frame index can't be meaningfully folded		; A user the materialized frame index can't be meaningfully folded
; into.		; into.

; GCN-LABEL: {{^}}func_other_fi_user_i32:		; GCN-LABEL: {{^}}func_other_fi_user_i32:
; GCN: s_sub_u32 [[SUB:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI-NEXT: v_lshr_b32_e64 v0, [[SUB]], 6		; CI: v_lshr_b32_e64 v0, s32, 6

; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, [[SUB]]		; GFX9: v_lshrrev_b32_e64 v0, 6, s32

; GCN-NEXT: v_mul_u32_u24_e32 v0, 9, v0		; GCN-NEXT: v_mul_u32_u24_e32 v0, 9, v0
; GCN-NOT: v_mov		; GCN-NOT: v_mov
; GCN: ds_write_b32 v0, v0		; GCN: ds_write_b32 v0, v0
define void @func_other_fi_user_i32() #0 {		define void @func_other_fi_user_i32() #0 {
%alloca = alloca [2 x i32], align 4, addrspace(5)		%alloca = alloca [2 x i32], align 4, addrspace(5)
%ptrtoint = ptrtoint [2 x i32] addrspace(5)* %alloca to i32		%ptrtoint = ptrtoint [2 x i32] addrspace(5)* %alloca to i32
%mul = mul i32 %ptrtoint, 9		%mul = mul i32 %ptrtoint, 9
store volatile i32 %mul, i32 addrspace(3)* undef		store volatile i32 %mul, i32 addrspace(3)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_store_private_arg_i32_ptr:		; GCN-LABEL: {{^}}func_store_private_arg_i32_ptr:
; GCN: v_mov_b32_e32 v1, 15{{$}}		; GCN: v_mov_b32_e32 v1, 15{{$}}
; GCN: buffer_store_dword v1, v0, s[0:3], s33 offen{{$}}		; GCN: buffer_store_dword v1, v0, s[0:3], 0 offen{{$}}
define void @func_store_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {		define void @func_store_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {
store volatile i32 15, i32 addrspace(5)* %ptr		store volatile i32 15, i32 addrspace(5)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_load_private_arg_i32_ptr:		; GCN-LABEL: {{^}}func_load_private_arg_i32_ptr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen{{$}}		; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen{{$}}
define void @func_load_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {		define void @func_load_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {
%val = load volatile i32, i32 addrspace(5)* %ptr		%val = load volatile i32, i32 addrspace(5)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr:		; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_sub_u32 [[SUB_OFFSET:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI-NEXT: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], [[SUB_OFFSET]], 6		; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6
; CI-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]		; CI-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]

; GFX9-NEXT: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, [[SUB_OFFSET]]		; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32
; GFX9-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]		; GFX9-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]

; GCN-NOT: v_mov		; GCN-NOT: v_mov
; GCN: ds_write_b32 v0, v0		; GCN: ds_write_b32 v0, v0
define void @void_func_byval_struct_i8_i32_ptr({ i8, i32 } addrspace(5)* byval %arg0) #0 {		define void @void_func_byval_struct_i8_i32_ptr({ i8, i32 } addrspace(5)* byval %arg0) #0 {
%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0		%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0
%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1		%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1
%load1 = load i32, i32 addrspace(5)* %gep1		%load1 = load i32, i32 addrspace(5)* %gep1
Show All 11 Lines	define void @void_func_byval_struct_i8_i32_ptr_value({ i8, i32 } addrspace(5)* byval %arg0) #0 {
%load0 = load i8, i8 addrspace(5)* %gep0		%load0 = load i8, i8 addrspace(5)* %gep0
%load1 = load i32, i32 addrspace(5)* %gep1		%load1 = load i32, i32 addrspace(5)* %gep1
store volatile i8 %load0, i8 addrspace(3)* undef		store volatile i8 %load0, i8 addrspace(3)* undef
store volatile i32 %load1, i32 addrspace(3)* undef		store volatile i32 %load1, i32 addrspace(3)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_nonentry_block:		; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_nonentry_block:
; GCN: s_sub_u32 [[SUB_OFFSET:s[0-9]+]], s32, s33

; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], [[SUB_OFFSET]], 6		; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6

; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, [[SUB_OFFSET]]		; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32

; GCN: s_and_saveexec_b64		; GCN: s_and_saveexec_b64

; CI: v_add_i32_e32 [[GEP:v[0-9]+]], vcc, 4, [[SHIFT]]		; CI: v_add_i32_e32 [[GEP:v[0-9]+]], vcc, 4, [[SHIFT]]
; CI: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}		; CI: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}

; GFX9: v_add_u32_e32 [[GEP:v[0-9]+]], 4, [[SHIFT]]		; GFX9: v_add_u32_e32 [[GEP:v[0-9]+]], 4, [[SHIFT]]
; GFX9: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}		; GFX9: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}
Show All 11 Lines	bb:
br label %ret		br label %ret

ret:		ret:
ret void		ret void
}		}

; Added offset can't be used with VOP3 add		; Added offset can't be used with VOP3 add
; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32:		; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32:
; GCN: s_sub_u32 [[SUB:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33
; CI-DAG: s_movk_i32 [[K:s[0-9]+\|vcc_lo\|vcc_hi]], 0x200

; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], [[SUB]], 6		; CI-DAG: s_movk_i32 [[K:s[0-9]+\|vcc_lo\|vcc_hi]], 0x200
		; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
; CI: v_add_i32_e32 [[VZ:v[0-9]+]], vcc, [[K]], [[SCALED]]		; CI: v_add_i32_e32 [[VZ:v[0-9]+]], vcc, [[K]], [[SCALED]]

; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, [[SUB]]		; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]		; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]

; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]		; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]
; GCN: ds_write_b32 v0, [[VZ]]		; GCN: ds_write_b32 v0, [[VZ]]
define void @func_other_fi_user_non_inline_imm_offset_i32() #0 {		define void @func_other_fi_user_non_inline_imm_offset_i32() #0 {
%alloca0 = alloca [128 x i32], align 4, addrspace(5)		%alloca0 = alloca [128 x i32], align 4, addrspace(5)
%alloca1 = alloca [8 x i32], align 4, addrspace(5)		%alloca1 = alloca [8 x i32], align 4, addrspace(5)
%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65		%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65
%gep1 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca1, i32 0, i32 0		%gep1 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca1, i32 0, i32 0
store volatile i32 7, i32 addrspace(5)* %gep0		store volatile i32 7, i32 addrspace(5)* %gep0
%ptrtoint = ptrtoint i32 addrspace(5)* %gep1 to i32		%ptrtoint = ptrtoint i32 addrspace(5)* %gep1 to i32
%mul = mul i32 %ptrtoint, 9		%mul = mul i32 %ptrtoint, 9
store volatile i32 %mul, i32 addrspace(3)* undef		store volatile i32 %mul, i32 addrspace(3)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32_vcc_live:		; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32_vcc_live:
; GCN: s_sub_u32 [[DIFF:s[0-9]+]], s32, s33
; CI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x200

; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], [[DIFF]], 6		; CI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x200
		; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
; CI: v_add_i32_e64 [[VZ:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, [[OFFSET]], [[SCALED]]		; CI: v_add_i32_e64 [[VZ:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, [[OFFSET]], [[SCALED]]

; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, [[DIFF]]		; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]		; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]

; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]		; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]
; GCN: ds_write_b32 v0, [[VZ]]		; GCN: ds_write_b32 v0, [[VZ]]
define void @func_other_fi_user_non_inline_imm_offset_i32_vcc_live() #0 {		define void @func_other_fi_user_non_inline_imm_offset_i32_vcc_live() #0 {
%alloca0 = alloca [128 x i32], align 4, addrspace(5)		%alloca0 = alloca [128 x i32], align 4, addrspace(5)
%alloca1 = alloca [8 x i32], align 4, addrspace(5)		%alloca1 = alloca [8 x i32], align 4, addrspace(5)
%vcc = call i64 asm sideeffect "; def $0", "={vcc}"()		%vcc = call i64 asm sideeffect "; def $0", "={vcc}"()
Show All 34 Lines

bb5:		bb5:
ret void		ret void
}		}

; GCN-LABEL: {{^}}alloca_ptr_nonentry_block:		; GCN-LABEL: {{^}}alloca_ptr_nonentry_block:
; GCN: s_and_saveexec_b64		; GCN: s_and_saveexec_b64
; GCN: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4		; GCN: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4
; GCN: s_sub_u32 [[SUB_OFFSET:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], [[SUB_OFFSET]], 6		; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6
; CI-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]		; CI-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]

; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, [[SUB_OFFSET]]		; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32
; GFX9-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]		; GFX9-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]

; GCN: ds_write_b32 v{{[0-9]+}}, [[PTR]]		; GCN: ds_write_b32 v{{[0-9]+}}, [[PTR]]
define void @alloca_ptr_nonentry_block(i32 %arg0) #0 {		define void @alloca_ptr_nonentry_block(i32 %arg0) #0 {
%alloca0 = alloca { i8, i32 }, align 4, addrspace(5)		%alloca0 = alloca { i8, i32 }, align 4, addrspace(5)
%cmp = icmp eq i32 %arg0, 0		%cmp = icmp eq i32 %arg0, 0
br i1 %cmp, label %bb, label %ret		br i1 %cmp, label %bb, label %ret

Show All 12 Lines

llvm/test/CodeGen/AMDGPU/frame-lowering-entry-all-sgpr-used.mir

Show All 20 Lines	liveins:
- { reg: '$sgpr9' }		- { reg: '$sgpr9' }
machineFunctionInfo:		machineFunctionInfo:
explicitKernArgSize: 84		explicitKernArgSize: 84
maxKernArgAlign: 8		maxKernArgAlign: 8
ldsSize: 20496		ldsSize: 20496
isEntryFunction: true		isEntryFunction: true
waveLimiter: true		waveLimiter: true
scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'		scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
scratchWaveOffsetReg: '$sgpr101'
frameOffsetReg: '$sgpr101'		frameOffsetReg: '$sgpr101'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
argumentInfo:		argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr4_sgpr5' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		workGroupIDX: { reg: '$sgpr8' }
workGroupIDY: { reg: '$sgpr9' }		workGroupIDY: { reg: '$sgpr9' }
Show All 17 Lines

llvm/test/CodeGen/AMDGPU/frame-lowering-fp-adjusted.mir

	Show All 23 Lines
	stack:			stack:
	- { id: 0, type: spill-slot, size: 4, alignment: 4 }			- { id: 0, type: spill-slot, size: 4, alignment: 4 }
	machineFunctionInfo:			machineFunctionInfo:
	explicitKernArgSize: 660			explicitKernArgSize: 660
	maxKernArgAlign: 4			maxKernArgAlign: 4
	isEntryFunction: true			isEntryFunction: true
	waveLimiter: true			waveLimiter: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr101'
	frameOffsetReg: '$sgpr101'			frameOffsetReg: '$sgpr101'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	argumentInfo:			argumentInfo:
	privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	dispatchPtr: { reg: '$sgpr4_sgpr5' }			dispatchPtr: { reg: '$sgpr4_sgpr5' }
	kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }			kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
	workGroupIDX: { reg: '$sgpr8' }			workGroupIDX: { reg: '$sgpr8' }
	privateSegmentWaveByteOffset: { reg: '$sgpr9' }			privateSegmentWaveByteOffset: { reg: '$sgpr9' }
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	liveins: $sgpr8, $vgpr0, $sgpr4_sgpr5, $sgpr6_sgpr7			liveins: $sgpr8, $vgpr0, $sgpr4_sgpr5, $sgpr6_sgpr7

	bb.1:			bb.1:
	liveins: $sgpr4, $sgpr5, $sgpr9, $sgpr22, $vgpr0, $sgpr6_sgpr7			liveins: $sgpr4, $sgpr5, $sgpr9, $sgpr22, $vgpr0, $sgpr6_sgpr7

	renamable $vgpr2 = IMPLICIT_DEF			renamable $vgpr2 = IMPLICIT_DEF
	SI_SPILL_V32_SAVE killed $vgpr2, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			SI_SPILL_V32_SAVE killed $vgpr2, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)

llvm/test/CodeGen/AMDGPU/function-returns.ll

	Show First 20 Lines • Show All 403 Lines • ▼ Show 20 Lines
	define {i8, i32} @struct_i8_i32_func_void() #0 {			define {i8, i32} @struct_i8_i32_func_void() #0 {
	%val = load { i8, i32 }, { i8, i32 } addrspace(1)* undef			%val = load { i8, i32 }, { i8, i32 } addrspace(1)* undef
	ret { i8, i32 } %val			ret { i8, i32 } %val
	}			}

	; GCN-LABEL: {{^}}void_func_sret_struct_i8_i32:			; GCN-LABEL: {{^}}void_func_sret_struct_i8_i32:
	; GCN: buffer_load_ubyte [[VAL0:v[0-9]+]]			; GCN: buffer_load_ubyte [[VAL0:v[0-9]+]]
	; GCN: buffer_load_dword [[VAL1:v[0-9]+]]			; GCN: buffer_load_dword [[VAL1:v[0-9]+]]
	; GCN: buffer_store_byte [[VAL0]], v0, s[0:3], s33 offen{{$}}			; GCN: buffer_store_byte [[VAL0]], v0, s[0:3], 0 offen{{$}}
	; GCN: buffer_store_dword [[VAL1]], v0, s[0:3], s33 offen offset:4{{$}}			; GCN: buffer_store_dword [[VAL1]], v0, s[0:3], 0 offen offset:4{{$}}
	define void @void_func_sret_struct_i8_i32({ i8, i32 } addrspace(5)* sret %arg0) #0 {			define void @void_func_sret_struct_i8_i32({ i8, i32 } addrspace(5)* sret %arg0) #0 {
	%val0 = load volatile i8, i8 addrspace(1)* undef			%val0 = load volatile i8, i8 addrspace(1)* undef
	%val1 = load volatile i32, i32 addrspace(1)* undef			%val1 = load volatile i32, i32 addrspace(1)* undef
	%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1
	store i8 %val0, i8 addrspace(5)* %gep0			store i8 %val0, i8 addrspace(5)* %gep0
	store i32 %val1, i32 addrspace(5)* %gep1			store i32 %val1, i32 addrspace(5)* %gep1
	ret void			ret void
	}			}

	; FIXME: Should be able to fold offsets in all of these pre-gfx9. Call			; FIXME: Should be able to fold offsets in all of these pre-gfx9. Call
	; lowering introduces an extra CopyToReg/CopyFromReg obscuring the			; lowering introduces an extra CopyToReg/CopyFromReg obscuring the
	; AssertZext inserted. Not using it introduces the spills.			; AssertZext inserted. Not using it introduces the spills.

	; GCN-LABEL: {{^}}v33i32_func_void:			; GCN-LABEL: {{^}}v33i32_func_void:
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:4{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:4{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:8{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:8{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:12{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:12{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:16{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:16{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:20{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:20{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:24{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:24{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:28{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:28{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:32{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:32{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:36{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:36{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:40{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:40{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:44{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:44{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:48{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:48{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:52{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:52{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:56{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:56{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:60{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:60{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:64{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:64{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:68{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:68{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:72{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:72{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:76{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:76{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:80{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:80{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:84{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:84{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:88{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:88{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:92{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:92{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:96{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:96{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:100{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:100{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:104{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:104{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:108{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:108{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:112{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:112{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:116{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:116{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:120{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:120{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:124{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:124{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:128{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:128{{$}}
	; GFX9: s_waitcnt vmcnt(0)			; GFX9: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64			; GFX9-NEXT: s_setpc_b64
	define <33 x i32> @v33i32_func_void() #0 {			define <33 x i32> @v33i32_func_void() #0 {
	%ptr = load volatile <33 x i32> addrspace(1), <33 x i32> addrspace(1) addrspace(4)* undef			%ptr = load volatile <33 x i32> addrspace(1), <33 x i32> addrspace(1) addrspace(4)* undef
	%val = load <33 x i32>, <33 x i32> addrspace(1)* %ptr			%val = load <33 x i32>, <33 x i32> addrspace(1)* %ptr
	ret <33 x i32> %val			ret <33 x i32> %val
	}			}

	; GCN-LABEL: {{^}}struct_v32i32_i32_func_void:			; GCN-LABEL: {{^}}struct_v32i32_i32_func_void:
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:4{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:4{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:8{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:8{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:12{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:12{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:16{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:16{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:20{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:20{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:24{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:24{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:28{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:28{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:32{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:32{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:36{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:36{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:40{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:40{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:44{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:44{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:48{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:48{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:52{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:52{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:56{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:56{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:60{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:60{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:64{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:64{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:68{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:68{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:72{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:72{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:76{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:76{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:80{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:80{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:84{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:84{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:88{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:88{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:92{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:92{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:96{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:96{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:100{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:100{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:104{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:104{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:108{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:108{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:112{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:112{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:116{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:116{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:120{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:120{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:124{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:124{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:128{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:128{{$}}
	; GFX9: s_waitcnt vmcnt(0)			; GFX9: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64			; GFX9-NEXT: s_setpc_b64
	define { <32 x i32>, i32 } @struct_v32i32_i32_func_void() #0 {			define { <32 x i32>, i32 } @struct_v32i32_i32_func_void() #0 {
	%ptr = load volatile { <32 x i32>, i32 } addrspace(1), { <32 x i32>, i32 } addrspace(1) addrspace(4)* undef			%ptr = load volatile { <32 x i32>, i32 } addrspace(1), { <32 x i32>, i32 } addrspace(1) addrspace(4)* undef
	%val = load { <32 x i32>, i32 }, { <32 x i32>, i32 } addrspace(1)* %ptr			%val = load { <32 x i32>, i32 }, { <32 x i32>, i32 } addrspace(1)* %ptr
	ret { <32 x i32>, i32 }%val			ret { <32 x i32>, i32 }%val
	}			}

	; GCN-LABEL: {{^}}struct_i32_v32i32_func_void:			; GCN-LABEL: {{^}}struct_i32_v32i32_func_void:
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:128{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:128{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:132{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:132{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:136{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:136{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:140{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:140{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:144{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:144{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:148{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:148{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:152{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:152{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:156{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:156{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:160{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:160{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:164{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:164{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:168{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:168{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:172{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:172{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:176{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:176{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:180{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:180{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:184{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:184{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:188{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:188{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:192{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:192{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:196{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:196{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:200{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:200{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:204{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:204{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:208{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:208{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:212{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:212{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:216{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:216{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:220{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:220{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:224{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:224{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:228{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:228{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:232{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:232{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:236{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:236{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:240{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:240{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:244{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:244{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:248{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:248{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:252{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:252{{$}}
	; GFX9: s_waitcnt vmcnt(0)			; GFX9: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64			; GFX9-NEXT: s_setpc_b64
	define { i32, <32 x i32> } @struct_i32_v32i32_func_void() #0 {			define { i32, <32 x i32> } @struct_i32_v32i32_func_void() #0 {
	%ptr = load volatile { i32, <32 x i32> } addrspace(1), { i32, <32 x i32> } addrspace(1) addrspace(4)* undef			%ptr = load volatile { i32, <32 x i32> } addrspace(1), { i32, <32 x i32> } addrspace(1) addrspace(4)* undef
	%val = load { i32, <32 x i32> }, { i32, <32 x i32> } addrspace(1)* %ptr			%val = load { i32, <32 x i32> }, { i32, <32 x i32> } addrspace(1)* %ptr
	ret { i32, <32 x i32> }%val			ret { i32, <32 x i32> }%val
	}			}

	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	entry:
%a.val = load half, half addrspace(1)* %a		%a.val = load half, half addrspace(1)* %a
%b.val = load half, half addrspace(1)* %b		%b.val = load half, half addrspace(1)* %b
%r.val = fadd half %a.val, %b.val		%r.val = fadd half %a.val, %b.val
store half %r.val, half addrspace(1)* %r		store half %r.val, half addrspace(1)* %r
ret void		ret void
}		}

; CHECK: .name: num_spilled_sgprs		; CHECK: .name: num_spilled_sgprs
; GFX700: .sgpr_spill_count: 40		; GFX700: .sgpr_spill_count: 38
; GFX803: .sgpr_spill_count: 24		; GFX803: .sgpr_spill_count: 22
; GFX900: .sgpr_spill_count: 24		; GFX900: .sgpr_spill_count: 22
; GFX1010: .sgpr_spill_count: 24		; GFX1010: .sgpr_spill_count: 22
; CHECK: .symbol: num_spilled_sgprs.kd		; CHECK: .symbol: num_spilled_sgprs.kd
define amdgpu_kernel void @num_spilled_sgprs(		define amdgpu_kernel void @num_spilled_sgprs(
i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, [8 x i32],		i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, [8 x i32],
i32 addrspace(1)* %out2, i32 addrspace(1)* %out3, [8 x i32],		i32 addrspace(1)* %out2, i32 addrspace(1)* %out3, [8 x i32],
i32 addrspace(1)* %out4, i32 addrspace(1)* %out5, [8 x i32],		i32 addrspace(1)* %out4, i32 addrspace(1)* %out5, [8 x i32],
i32 addrspace(1)* %out6, i32 addrspace(1)* %out7, [8 x i32],		i32 addrspace(1)* %out6, i32 addrspace(1)* %out7, [8 x i32],
i32 addrspace(1)* %out8, i32 addrspace(1)* %out9, [8 x i32],		i32 addrspace(1)* %out8, i32 addrspace(1)* %out9, [8 x i32],
i32 addrspace(1)* %outa, i32 addrspace(1)* %outb, [8 x i32],		i32 addrspace(1)* %outa, i32 addrspace(1)* %outb, [8 x i32],
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props.ll

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	entry:
%r.val = fadd half %a.val, %b.val		%r.val = fadd half %a.val, %b.val
store half %r.val, half addrspace(1)* %r		store half %r.val, half addrspace(1)* %r
ret void		ret void
}		}

; CHECK-LABEL: - Name: num_spilled_sgprs		; CHECK-LABEL: - Name: num_spilled_sgprs
; CHECK: SymbolName: 'num_spilled_sgprs@kd'		; CHECK: SymbolName: 'num_spilled_sgprs@kd'
; CHECK: CodeProps:		; CHECK: CodeProps:
; GFX700: NumSpilledSGPRs: 40		; GFX700: NumSpilledSGPRs: 38
; GFX803: NumSpilledSGPRs: 24		; GFX803: NumSpilledSGPRs: 22
; GFX900: NumSpilledSGPRs: 24		; GFX900: NumSpilledSGPRs: 22
define amdgpu_kernel void @num_spilled_sgprs(		define amdgpu_kernel void @num_spilled_sgprs(
i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, [8 x i32],		i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, [8 x i32],
i32 addrspace(1)* %out2, i32 addrspace(1)* %out3, [8 x i32],		i32 addrspace(1)* %out2, i32 addrspace(1)* %out3, [8 x i32],
i32 addrspace(1)* %out4, i32 addrspace(1)* %out5, [8 x i32],		i32 addrspace(1)* %out4, i32 addrspace(1)* %out5, [8 x i32],
i32 addrspace(1)* %out6, i32 addrspace(1)* %out7, [8 x i32],		i32 addrspace(1)* %out6, i32 addrspace(1)* %out7, [8 x i32],
i32 addrspace(1)* %out8, i32 addrspace(1)* %out9, [8 x i32],		i32 addrspace(1)* %out8, i32 addrspace(1)* %out9, [8 x i32],
i32 addrspace(1)* %outa, i32 addrspace(1)* %outb, [8 x i32],		i32 addrspace(1)* %outa, i32 addrspace(1)* %outb, [8 x i32],
i32 addrspace(1)* %outc, i32 addrspace(1)* %outd, [8 x i32],		i32 addrspace(1)* %outc, i32 addrspace(1)* %outd, [8 x i32],
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot8s.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s

define amdgpu_kernel void @idot8_acc32(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc32(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc32:		; GFX7-LABEL: idot8_acc32:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s2, s0, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s7, s5, 0x40000
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s9, s5, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v0, s8		; GFX7-NEXT: v_mov_b32_e32 v0, s7
; GFX7-NEXT: v_mov_b32_e32 v1, s21
; GFX7-NEXT: v_mad_i32_i24 v0, s2, v0, v1
; GFX7-NEXT: s_bfe_i32 s9, s0, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v1, s10
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008
; GFX7-NEXT: v_mad_i32_i24 v0, s9, v1, v0
; GFX7-NEXT: s_bfe_i32 s11, s0, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v1, s12
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x4000c
; GFX7-NEXT: v_mad_i32_i24 v0, s11, v1, v0
; GFX7-NEXT: s_bfe_i32 s13, s0, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v1, s14
; GFX7-NEXT: s_bfe_i32 s16, s1, 0x40010
; GFX7-NEXT: v_mad_i32_i24 v0, s13, v1, v0
; GFX7-NEXT: s_bfe_i32 s15, s0, 0x40010
; GFX7-NEXT: v_mov_b32_e32 v1, s16
; GFX7-NEXT: s_bfe_i32 s18, s1, 0x40014
; GFX7-NEXT: s_bfe_i32 s20, s1, 0x40018
; GFX7-NEXT: v_mad_i32_i24 v0, s15, v1, v0
; GFX7-NEXT: s_bfe_i32 s17, s0, 0x40014
; GFX7-NEXT: v_mov_b32_e32 v1, s18
; GFX7-NEXT: s_bfe_i32 s19, s0, 0x40018
; GFX7-NEXT: v_mad_i32_i24 v0, s17, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s20		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: s_ashr_i32 s1, s1, 28		; GFX7-NEXT: v_mad_i32_i24 v0, s6, v0, v1
; GFX7-NEXT: v_mad_i32_i24 v0, s19, v1, v0		; GFX7-NEXT: s_bfe_i32 s8, s4, 0x40004
; GFX7-NEXT: s_ashr_i32 s0, s0, 28		; GFX7-NEXT: v_mov_b32_e32 v1, s9
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: s_bfe_i32 s11, s5, 0x40008
; GFX7-NEXT: v_mad_i32_i24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s8, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: s_bfe_i32 s10, s4, 0x40008
		; GFX7-NEXT: v_mov_b32_e32 v1, s11
		; GFX7-NEXT: s_bfe_i32 s13, s5, 0x4000c
		; GFX7-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX7-NEXT: s_bfe_i32 s12, s4, 0x4000c
		; GFX7-NEXT: v_mov_b32_e32 v1, s13
		; GFX7-NEXT: s_bfe_i32 s15, s5, 0x40010
		; GFX7-NEXT: v_mad_i32_i24 v0, s12, v1, v0
		; GFX7-NEXT: s_bfe_i32 s14, s4, 0x40010
		; GFX7-NEXT: v_mov_b32_e32 v1, s15
		; GFX7-NEXT: s_bfe_i32 s17, s5, 0x40014
		; GFX7-NEXT: s_bfe_i32 s19, s5, 0x40018
		; GFX7-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX7-NEXT: s_bfe_i32 s16, s4, 0x40014
		; GFX7-NEXT: v_mov_b32_e32 v1, s17
		; GFX7-NEXT: s_bfe_i32 s18, s4, 0x40018
		; GFX7-NEXT: v_mad_i32_i24 v0, s16, v1, v0
		; GFX7-NEXT: v_mov_b32_e32 v1, s19
		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
		; GFX7-NEXT: v_mad_i32_i24 v0, s18, v1, v0
		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
		; GFX7-NEXT: v_mov_b32_e32 v1, s5
		; GFX7-NEXT: v_mad_i32_i24 v0, s4, v1, v0
		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc32:		; GFX8-LABEL: idot8_acc32:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX8-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX8-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX8-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX8-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX8-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s5
; GFX8-NEXT: v_mov_b32_e32 v1, s19
; GFX8-NEXT: v_mad_i32_i24 v0, s5, v0, v1
; GFX8-NEXT: s_bfe_i32 s7, s2, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v1, s8
; GFX8-NEXT: s_bfe_i32 s10, s4, 0x40008
; GFX8-NEXT: v_mad_i32_i24 v0, s7, v1, v0
; GFX8-NEXT: s_bfe_i32 s9, s2, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v1, s10
; GFX8-NEXT: s_bfe_i32 s12, s4, 0x4000c
; GFX8-NEXT: v_mad_i32_i24 v0, s9, v1, v0
; GFX8-NEXT: s_bfe_i32 s11, s2, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v1, s12
; GFX8-NEXT: s_bfe_i32 s14, s4, 0x40010
; GFX8-NEXT: v_mad_i32_i24 v0, s11, v1, v0
; GFX8-NEXT: s_bfe_i32 s13, s2, 0x40010
; GFX8-NEXT: v_mov_b32_e32 v1, s14
; GFX8-NEXT: s_bfe_i32 s16, s4, 0x40014
; GFX8-NEXT: s_bfe_i32 s18, s4, 0x40018
; GFX8-NEXT: v_mad_i32_i24 v0, s13, v1, v0
; GFX8-NEXT: s_bfe_i32 s15, s2, 0x40014
; GFX8-NEXT: v_mov_b32_e32 v1, s16
; GFX8-NEXT: s_bfe_i32 s17, s2, 0x40018
; GFX8-NEXT: v_mad_i32_i24 v0, s15, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s18		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: s_ashr_i32 s4, s4, 28		; GFX8-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX8-NEXT: v_mad_i32_i24 v0, s17, v1, v0		; GFX8-NEXT: s_bfe_i32 s6, s2, 0x40004
		; GFX8-NEXT: v_mov_b32_e32 v1, s7
		; GFX8-NEXT: s_bfe_i32 s9, s3, 0x40008
		; GFX8-NEXT: v_mad_i32_i24 v0, s6, v1, v0
		; GFX8-NEXT: s_bfe_i32 s8, s2, 0x40008
		; GFX8-NEXT: v_mov_b32_e32 v1, s9
		; GFX8-NEXT: s_bfe_i32 s11, s3, 0x4000c
		; GFX8-NEXT: v_mad_i32_i24 v0, s8, v1, v0
		; GFX8-NEXT: s_bfe_i32 s10, s2, 0x4000c
		; GFX8-NEXT: v_mov_b32_e32 v1, s11
		; GFX8-NEXT: s_bfe_i32 s13, s3, 0x40010
		; GFX8-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX8-NEXT: s_bfe_i32 s12, s2, 0x40010
		; GFX8-NEXT: v_mov_b32_e32 v1, s13
		; GFX8-NEXT: s_bfe_i32 s15, s3, 0x40014
		; GFX8-NEXT: s_bfe_i32 s17, s3, 0x40018
		; GFX8-NEXT: v_mad_i32_i24 v0, s12, v1, v0
		; GFX8-NEXT: s_bfe_i32 s14, s2, 0x40014
		; GFX8-NEXT: v_mov_b32_e32 v1, s15
		; GFX8-NEXT: s_bfe_i32 s16, s2, 0x40018
		; GFX8-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX8-NEXT: v_mov_b32_e32 v1, s17
		; GFX8-NEXT: s_ashr_i32 s3, s3, 28
		; GFX8-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX8-NEXT: s_ashr_i32 s2, s2, 28		; GFX8-NEXT: s_ashr_i32 s2, s2, 28
; GFX8-NEXT: v_mov_b32_e32 v1, s4		; GFX8-NEXT: v_mov_b32_e32 v1, s3
; GFX8-NEXT: v_mad_i32_i24 v2, s2, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v2, s2, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc32:		; GFX9-LABEL: idot8_acc32:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX9-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX9-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX9-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX9-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX9-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s5
; GFX9-NEXT: v_mov_b32_e32 v1, s19
; GFX9-NEXT: v_mad_i32_i24 v0, s5, v0, v1
; GFX9-NEXT: s_bfe_i32 s7, s2, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v1, s8
; GFX9-NEXT: s_bfe_i32 s10, s4, 0x40008
; GFX9-NEXT: v_mad_i32_i24 v0, s7, v1, v0
; GFX9-NEXT: s_bfe_i32 s9, s2, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v1, s10
; GFX9-NEXT: s_bfe_i32 s12, s4, 0x4000c
; GFX9-NEXT: v_mad_i32_i24 v0, s9, v1, v0
; GFX9-NEXT: s_bfe_i32 s11, s2, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v1, s12
; GFX9-NEXT: s_bfe_i32 s14, s4, 0x40010
; GFX9-NEXT: v_mad_i32_i24 v0, s11, v1, v0
; GFX9-NEXT: s_bfe_i32 s13, s2, 0x40010
; GFX9-NEXT: v_mov_b32_e32 v1, s14
; GFX9-NEXT: s_bfe_i32 s16, s4, 0x40014
; GFX9-NEXT: s_bfe_i32 s18, s4, 0x40018
; GFX9-NEXT: v_mad_i32_i24 v0, s13, v1, v0
; GFX9-NEXT: s_bfe_i32 s15, s2, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v1, s16
; GFX9-NEXT: s_bfe_i32 s17, s2, 0x40018
; GFX9-NEXT: v_mad_i32_i24 v0, s15, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s18		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: s_ashr_i32 s4, s4, 28		; GFX9-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX9-NEXT: v_mad_i32_i24 v0, s17, v1, v0		; GFX9-NEXT: s_bfe_i32 s6, s2, 0x40004
		; GFX9-NEXT: v_mov_b32_e32 v1, s7
		; GFX9-NEXT: s_bfe_i32 s9, s3, 0x40008
		; GFX9-NEXT: v_mad_i32_i24 v0, s6, v1, v0
		; GFX9-NEXT: s_bfe_i32 s8, s2, 0x40008
		; GFX9-NEXT: v_mov_b32_e32 v1, s9
		; GFX9-NEXT: s_bfe_i32 s11, s3, 0x4000c
		; GFX9-NEXT: v_mad_i32_i24 v0, s8, v1, v0
		; GFX9-NEXT: s_bfe_i32 s10, s2, 0x4000c
		; GFX9-NEXT: v_mov_b32_e32 v1, s11
		; GFX9-NEXT: s_bfe_i32 s13, s3, 0x40010
		; GFX9-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX9-NEXT: s_bfe_i32 s12, s2, 0x40010
		; GFX9-NEXT: v_mov_b32_e32 v1, s13
		; GFX9-NEXT: s_bfe_i32 s15, s3, 0x40014
		; GFX9-NEXT: s_bfe_i32 s17, s3, 0x40018
		; GFX9-NEXT: v_mad_i32_i24 v0, s12, v1, v0
		; GFX9-NEXT: s_bfe_i32 s14, s2, 0x40014
		; GFX9-NEXT: v_mov_b32_e32 v1, s15
		; GFX9-NEXT: s_bfe_i32 s16, s2, 0x40018
		; GFX9-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX9-NEXT: v_mov_b32_e32 v1, s17
		; GFX9-NEXT: s_ashr_i32 s3, s3, 28
		; GFX9-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX9-NEXT: s_ashr_i32 s2, s2, 28		; GFX9-NEXT: s_ashr_i32 s2, s2, 28
; GFX9-NEXT: v_mov_b32_e32 v1, s4		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_mad_i32_i24 v2, s2, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v2, s2, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc32:		; GFX9-DL-LABEL: idot8_acc32:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s6, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX9-DL-NEXT: v_dot8_i32_i4 v2, s4, v0, v1		; GFX9-DL-NEXT: v_dot8_i32_i4 v2, s4, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc32:		; GFX10-DL-LABEL: idot8_acc32:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX10-DL-NEXT: v_dot8_i32_i4 v2, s1, s2, v0		; GFX10-DL-NEXT: v_dot8_i32_i4 v2, s0, s1, v0
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s8		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s9		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Once the unnecessary zero extentions of the elements are removed;		; TODO: Once the unnecessary zero extentions of the elements are removed;
; pattern recognizer will kick in.		; pattern recognizer will kick in.
define amdgpu_kernel void @idot8_acc16(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc16(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc16:		; GFX7-LABEL: idot8_acc16:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_mov_b32 s0, 0xffff		; GFX7-NEXT: s_mov_b32 s8, 0xffff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s2, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000		; GFX7-NEXT: s_bfe_i32 s7, s5, 0x40000
; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40004		; GFX7-NEXT: s_bfe_i32 s10, s5, 0x40004
; GFX7-NEXT: s_and_b32 s9, s9, s0		; GFX7-NEXT: s_and_b32 s7, s7, s8
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s9, s4, 0x40004
; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40008		; GFX7-NEXT: s_bfe_i32 s12, s5, 0x40008
; GFX7-NEXT: s_and_b32 s11, s11, s0		; GFX7-NEXT: s_and_b32 s10, s10, s8
; GFX7-NEXT: s_and_b32 s8, s8, s0		; GFX7-NEXT: s_and_b32 s6, s6, s8
; GFX7-NEXT: v_mov_b32_e32 v1, s9		; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008		; GFX7-NEXT: s_bfe_i32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_i32 s15, s2, 0x4000c		; GFX7-NEXT: s_bfe_i32 s14, s5, 0x4000c
; GFX7-NEXT: s_and_b32 s13, s13, s0		; GFX7-NEXT: s_and_b32 s12, s12, s8
; GFX7-NEXT: s_and_b32 s10, s10, s0		; GFX7-NEXT: s_and_b32 s9, s9, s8
; GFX7-NEXT: v_mov_b32_e32 v2, s11		; GFX7-NEXT: v_mov_b32_e32 v2, s10
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x4000c		; GFX7-NEXT: s_bfe_i32 s13, s4, 0x4000c
; GFX7-NEXT: s_bfe_i32 s17, s2, 0x40010		; GFX7-NEXT: s_bfe_i32 s16, s5, 0x40010
; GFX7-NEXT: s_and_b32 s15, s15, s0		; GFX7-NEXT: s_and_b32 s14, s14, s8
; GFX7-NEXT: s_and_b32 s12, s12, s0		; GFX7-NEXT: s_and_b32 s11, s11, s8
; GFX7-NEXT: v_mov_b32_e32 v3, s13		; GFX7-NEXT: v_mov_b32_e32 v3, s12
; GFX7-NEXT: s_bfe_i32 s16, s1, 0x40010		; GFX7-NEXT: s_bfe_i32 s15, s4, 0x40010
; GFX7-NEXT: s_bfe_i32 s19, s2, 0x40014		; GFX7-NEXT: s_bfe_i32 s18, s5, 0x40014
; GFX7-NEXT: s_and_b32 s17, s17, s0		; GFX7-NEXT: s_and_b32 s16, s16, s8
; GFX7-NEXT: s_and_b32 s14, s14, s0		; GFX7-NEXT: s_and_b32 s13, s13, s8
; GFX7-NEXT: v_mov_b32_e32 v4, s15		; GFX7-NEXT: v_mov_b32_e32 v4, s14
; GFX7-NEXT: s_bfe_i32 s21, s2, 0x40018		; GFX7-NEXT: s_bfe_i32 s20, s5, 0x40018
; GFX7-NEXT: s_bfe_i32 s18, s1, 0x40014		; GFX7-NEXT: s_bfe_i32 s17, s4, 0x40014
; GFX7-NEXT: s_and_b32 s19, s19, s0		; GFX7-NEXT: s_and_b32 s18, s18, s8
; GFX7-NEXT: s_and_b32 s16, s16, s0		; GFX7-NEXT: s_and_b32 s15, s15, s8
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: s_bfe_i32 s20, s1, 0x40018		; GFX7-NEXT: s_bfe_i32 s19, s4, 0x40018
; GFX7-NEXT: s_ashr_i32 s2, s2, 28		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
; GFX7-NEXT: s_and_b32 s21, s21, s0		; GFX7-NEXT: s_and_b32 s20, s20, s8
; GFX7-NEXT: s_and_b32 s18, s18, s0		; GFX7-NEXT: s_and_b32 s17, s17, s8
; GFX7-NEXT: v_mov_b32_e32 v6, s19		; GFX7-NEXT: v_mov_b32_e32 v6, s18
; GFX7-NEXT: s_ashr_i32 s1, s1, 28		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
; GFX7-NEXT: s_and_b32 s20, s20, s0		; GFX7-NEXT: s_and_b32 s19, s19, s8
; GFX7-NEXT: s_and_b32 s2, s2, s0		; GFX7-NEXT: s_and_b32 s5, s5, s8
; GFX7-NEXT: v_mov_b32_e32 v7, s21		; GFX7-NEXT: v_mov_b32_e32 v7, s20
; GFX7-NEXT: s_and_b32 s0, s1, s0		; GFX7-NEXT: s_and_b32 s4, s4, s8
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s14, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s13, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s16, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s15, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s18, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s17, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s20, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s19, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s2		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc16:		; GFX8-LABEL: idot8_acc16:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ushort v2, v[0:1]		; GFX8-NEXT: flat_load_ushort v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX8-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX8-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX8-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX8-NEXT: s_bfe_i32 s8, s1, 0x40004		; GFX8-NEXT: s_bfe_i32 s7, s1, 0x40004
; GFX8-NEXT: s_bfe_i32 s10, s1, 0x40008		; GFX8-NEXT: s_bfe_i32 s9, s1, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v6, s6		; GFX8-NEXT: v_mov_b32_e32 v6, s5
; GFX8-NEXT: s_lshr_b32 s2, s0, 12		; GFX8-NEXT: s_lshr_b32 s2, s0, 12
; GFX8-NEXT: s_lshr_b32 s4, s1, 12		; GFX8-NEXT: s_lshr_b32 s3, s1, 12
; GFX8-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX8-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX8-NEXT: s_bfe_i32 s9, s0, 0x40008		; GFX8-NEXT: s_bfe_i32 s8, s0, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mov_b32_e32 v7, s8		; GFX8-NEXT: v_mov_b32_e32 v7, s7
; GFX8-NEXT: v_lshlrev_b16_e64 v4, 12, s2		; GFX8-NEXT: v_lshlrev_b16_e64 v4, 12, s2
; GFX8-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX8-NEXT: v_lshlrev_b16_e64 v5, 12, s3
; GFX8-NEXT: v_mul_i32_i24_e32 v3, s9, v3		; GFX8-NEXT: v_mul_i32_i24_e32 v3, s8, v3
; GFX8-NEXT: s_bfe_i32 s12, s1, 0x40010		; GFX8-NEXT: s_bfe_i32 s11, s1, 0x40010
; GFX8-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX8-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX8-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX8-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX8-NEXT: s_bfe_i32 s14, s1, 0x40014		; GFX8-NEXT: s_bfe_i32 s13, s1, 0x40014
; GFX8-NEXT: s_bfe_i32 s11, s0, 0x40010		; GFX8-NEXT: s_bfe_i32 s10, s0, 0x40010
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: s_bfe_i32 s16, s1, 0x40018		; GFX8-NEXT: s_bfe_i32 s15, s1, 0x40018
; GFX8-NEXT: s_bfe_i32 s13, s0, 0x40014		; GFX8-NEXT: s_bfe_i32 s12, s0, 0x40014
; GFX8-NEXT: v_mov_b32_e32 v9, s14		; GFX8-NEXT: v_mov_b32_e32 v9, s13
; GFX8-NEXT: s_bfe_i32 s15, s0, 0x40018		; GFX8-NEXT: s_bfe_i32 s14, s0, 0x40018
; GFX8-NEXT: s_ashr_i32 s1, s1, 28		; GFX8-NEXT: s_ashr_i32 s1, s1, 28
; GFX8-NEXT: v_mov_b32_e32 v10, s16		; GFX8-NEXT: v_mov_b32_e32 v10, s15
; GFX8-NEXT: s_ashr_i32 s0, s0, 28		; GFX8-NEXT: s_ashr_i32 s0, s0, 28
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s4, v6, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s7, v7, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s6, v7, v2
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX8-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s11, v8, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s10, v8, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s13, v9, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s12, v9, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s15, v10, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s14, v10, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX8-NEXT: flat_store_short v[0:1], v2		; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc16:		; GFX9-LABEL: idot8_acc16:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ushort v2, v[0:1], off		; GFX9-NEXT: global_load_ushort v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX9-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX9-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX9-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX9-NEXT: s_bfe_i32 s8, s1, 0x40004		; GFX9-NEXT: s_bfe_i32 s7, s1, 0x40004
; GFX9-NEXT: s_bfe_i32 s10, s1, 0x40008		; GFX9-NEXT: s_bfe_i32 s9, s1, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v6, s6		; GFX9-NEXT: v_mov_b32_e32 v6, s5
; GFX9-NEXT: s_lshr_b32 s2, s0, 12		; GFX9-NEXT: s_lshr_b32 s2, s0, 12
; GFX9-NEXT: s_lshr_b32 s4, s1, 12		; GFX9-NEXT: s_lshr_b32 s3, s1, 12
; GFX9-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX9-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX9-NEXT: s_bfe_i32 s9, s0, 0x40008		; GFX9-NEXT: s_bfe_i32 s8, s0, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mov_b32_e32 v7, s8		; GFX9-NEXT: v_mov_b32_e32 v7, s7
; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s2		; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s2
; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s3
; GFX9-NEXT: v_mul_i32_i24_e32 v3, s9, v3		; GFX9-NEXT: v_mul_i32_i24_e32 v3, s8, v3
; GFX9-NEXT: s_bfe_i32 s12, s1, 0x40010		; GFX9-NEXT: s_bfe_i32 s11, s1, 0x40010
; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-NEXT: s_bfe_i32 s14, s1, 0x40014		; GFX9-NEXT: s_bfe_i32 s13, s1, 0x40014
; GFX9-NEXT: s_bfe_i32 s11, s0, 0x40010		; GFX9-NEXT: s_bfe_i32 s10, s0, 0x40010
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: s_bfe_i32 s16, s1, 0x40018		; GFX9-NEXT: s_bfe_i32 s15, s1, 0x40018
; GFX9-NEXT: s_bfe_i32 s13, s0, 0x40014		; GFX9-NEXT: s_bfe_i32 s12, s0, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v9, s14		; GFX9-NEXT: v_mov_b32_e32 v9, s13
; GFX9-NEXT: s_bfe_i32 s15, s0, 0x40018		; GFX9-NEXT: s_bfe_i32 s14, s0, 0x40018
; GFX9-NEXT: s_ashr_i32 s1, s1, 28		; GFX9-NEXT: s_ashr_i32 s1, s1, 28
; GFX9-NEXT: v_mov_b32_e32 v10, s16		; GFX9-NEXT: v_mov_b32_e32 v10, s15
; GFX9-NEXT: s_ashr_i32 s0, s0, 28		; GFX9-NEXT: s_ashr_i32 s0, s0, 28
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s4, v6, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s7, v7, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s6, v7, v2
; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX9-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s11, v8, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s10, v8, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s13, v9, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s12, v9, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s15, v10, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s14, v10, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX9-NEXT: global_store_short v[0:1], v2, off		; GFX9-NEXT: global_store_short v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc16:		; GFX9-DL-LABEL: idot8_acc16:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX9-DL-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX9-DL-NEXT: s_bfe_i32 s8, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s7, s1, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s10, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s9, s1, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s5
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 12		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 12
; GFX9-DL-NEXT: s_lshr_b32 s4, s1, 12		; GFX9-DL-NEXT: s_lshr_b32 s3, s1, 12
; GFX9-DL-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s9, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s8, s0, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s8		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s7
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s2		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s2
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s3
; GFX9-DL-NEXT: v_mul_i32_i24_e32 v3, s9, v3		; GFX9-DL-NEXT: v_mul_i32_i24_e32 v3, s8, v3
; GFX9-DL-NEXT: s_bfe_i32 s12, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s11, s1, 0x40010
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-DL-NEXT: s_bfe_i32 s14, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s13, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_i32 s11, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s10, s0, 0x40010
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: s_bfe_i32 s16, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s15, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_i32 s13, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s12, s0, 0x40014
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s13
; GFX9-DL-NEXT: s_bfe_i32 s15, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s14, s0, 0x40018
; GFX9-DL-NEXT: s_ashr_i32 s1, s1, 28		; GFX9-DL-NEXT: s_ashr_i32 s1, s1, 28
; GFX9-DL-NEXT: v_mov_b32_e32 v10, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v10, s15
; GFX9-DL-NEXT: s_ashr_i32 s0, s0, 28		; GFX9-DL-NEXT: s_ashr_i32 s0, s0, 28
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s4, v6, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s7, v7, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v7, v2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX9-DL-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s11, v8, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s10, v8, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s13, v9, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s12, v9, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s15, v10, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s14, v10, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX9-DL-NEXT: global_store_short v[0:1], v2, off		; GFX9-DL-NEXT: global_store_short v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc16:		; GFX10-DL-LABEL: idot8_acc16:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 12		; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 12
; GFX10-DL-NEXT: s_lshr_b32 s4, s1, 12		; GFX10-DL-NEXT: s_lshr_b32 s3, s1, 12
; GFX10-DL-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s2		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s2
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s4		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s3
; GFX10-DL-NEXT: s_bfe_i32 s8, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s7, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s9, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s8, s1, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s2, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s2, s1, 0x40004
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3
; GFX10-DL-NEXT: s_mov_b32 s4, 0xffff		; GFX10-DL-NEXT: s_mov_b32 s3, 0xffff
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4
; GFX10-DL-NEXT: v_mul_i32_i24_e64 v5, s8, s9		; GFX10-DL-NEXT: v_mul_i32_i24_e64 v5, s7, s8
; GFX10-DL-NEXT: v_and_b32_e32 v3, s4, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s3, v3
; GFX10-DL-NEXT: v_and_b32_e32 v4, s4, v4		; GFX10-DL-NEXT: v_and_b32_e32 v4, s3, v4
; GFX10-DL-NEXT: s_bfe_i32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_i32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_i32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s7, s2, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s6, s2, v2
; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40010
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX10-DL-NEXT: v_mad_u32_u24 v2, v3, v4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, v3, v4, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_i32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_ashr_i32 s0, s0, 28		; GFX10-DL-NEXT: s_ashr_i32 s0, s0, 28
; GFX10-DL-NEXT: s_ashr_i32 s1, s1, 28		; GFX10-DL-NEXT: s_ashr_i32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s0, s1, v2
; GFX10-DL-NEXT: global_store_short v[0:1], v2, off		; GFX10-DL-NEXT: global_store_short v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i16 addrspace(1)* nocapture %dst) {		i16 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	entry:
store i16 %add8, i16 addrspace(1)* %dst, align 4		store i16 %add8, i16 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Support this pattern.		; TODO: Support this pattern.
define amdgpu_kernel void @idot8_acc8(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc8(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc8:		; GFX7-LABEL: idot8_acc8:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_movk_i32 s0, 0xff		; GFX7-NEXT: s_movk_i32 s8, 0xff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s2, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000		; GFX7-NEXT: s_bfe_i32 s7, s5, 0x40000
; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40004		; GFX7-NEXT: s_bfe_i32 s10, s5, 0x40004
; GFX7-NEXT: s_and_b32 s9, s9, s0		; GFX7-NEXT: s_and_b32 s7, s7, s8
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s9, s4, 0x40004
; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40008		; GFX7-NEXT: s_bfe_i32 s12, s5, 0x40008
; GFX7-NEXT: s_and_b32 s11, s11, s0		; GFX7-NEXT: s_and_b32 s10, s10, s8
; GFX7-NEXT: s_and_b32 s8, s8, s0		; GFX7-NEXT: s_and_b32 s6, s6, s8
; GFX7-NEXT: v_mov_b32_e32 v1, s9		; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008		; GFX7-NEXT: s_bfe_i32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_i32 s15, s2, 0x4000c		; GFX7-NEXT: s_bfe_i32 s14, s5, 0x4000c
; GFX7-NEXT: s_and_b32 s13, s13, s0		; GFX7-NEXT: s_and_b32 s12, s12, s8
; GFX7-NEXT: s_and_b32 s10, s10, s0		; GFX7-NEXT: s_and_b32 s9, s9, s8
; GFX7-NEXT: v_mov_b32_e32 v2, s11		; GFX7-NEXT: v_mov_b32_e32 v2, s10
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x4000c		; GFX7-NEXT: s_bfe_i32 s13, s4, 0x4000c
; GFX7-NEXT: s_bfe_i32 s17, s2, 0x40010		; GFX7-NEXT: s_bfe_i32 s16, s5, 0x40010
; GFX7-NEXT: s_and_b32 s15, s15, s0		; GFX7-NEXT: s_and_b32 s14, s14, s8
; GFX7-NEXT: s_and_b32 s12, s12, s0		; GFX7-NEXT: s_and_b32 s11, s11, s8
; GFX7-NEXT: v_mov_b32_e32 v3, s13		; GFX7-NEXT: v_mov_b32_e32 v3, s12
; GFX7-NEXT: s_bfe_i32 s16, s1, 0x40010		; GFX7-NEXT: s_bfe_i32 s15, s4, 0x40010
; GFX7-NEXT: s_bfe_i32 s19, s2, 0x40014		; GFX7-NEXT: s_bfe_i32 s18, s5, 0x40014
; GFX7-NEXT: s_and_b32 s17, s17, s0		; GFX7-NEXT: s_and_b32 s16, s16, s8
; GFX7-NEXT: s_and_b32 s14, s14, s0		; GFX7-NEXT: s_and_b32 s13, s13, s8
; GFX7-NEXT: v_mov_b32_e32 v4, s15		; GFX7-NEXT: v_mov_b32_e32 v4, s14
; GFX7-NEXT: s_bfe_i32 s21, s2, 0x40018		; GFX7-NEXT: s_bfe_i32 s20, s5, 0x40018
; GFX7-NEXT: s_bfe_i32 s18, s1, 0x40014		; GFX7-NEXT: s_bfe_i32 s17, s4, 0x40014
; GFX7-NEXT: s_and_b32 s19, s19, s0		; GFX7-NEXT: s_and_b32 s18, s18, s8
; GFX7-NEXT: s_and_b32 s16, s16, s0		; GFX7-NEXT: s_and_b32 s15, s15, s8
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: s_bfe_i32 s20, s1, 0x40018		; GFX7-NEXT: s_bfe_i32 s19, s4, 0x40018
; GFX7-NEXT: s_ashr_i32 s2, s2, 28		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
; GFX7-NEXT: s_and_b32 s21, s21, s0		; GFX7-NEXT: s_and_b32 s20, s20, s8
; GFX7-NEXT: s_and_b32 s18, s18, s0		; GFX7-NEXT: s_and_b32 s17, s17, s8
; GFX7-NEXT: v_mov_b32_e32 v6, s19		; GFX7-NEXT: v_mov_b32_e32 v6, s18
; GFX7-NEXT: s_ashr_i32 s1, s1, 28		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
; GFX7-NEXT: s_and_b32 s20, s20, s0		; GFX7-NEXT: s_and_b32 s19, s19, s8
; GFX7-NEXT: s_and_b32 s2, s2, s0		; GFX7-NEXT: s_and_b32 s5, s5, s8
; GFX7-NEXT: v_mov_b32_e32 v7, s21		; GFX7-NEXT: v_mov_b32_e32 v7, s20
; GFX7-NEXT: s_and_b32 s0, s1, s0		; GFX7-NEXT: s_and_b32 s4, s4, s8
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s14, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s13, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s16, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s15, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s18, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s17, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s20, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s19, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s2		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc8:		; GFX8-LABEL: idot8_acc8:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_movk_i32 s2, 0xff		; GFX8-NEXT: s_movk_i32 s2, 0xff
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_i32 s7, s6, 0x40000		; GFX8-NEXT: s_bfe_i32 s6, s3, 0x40000
; GFX8-NEXT: s_lshr_b32 s4, s6, 12		; GFX8-NEXT: s_lshr_b32 s4, s3, 12
; GFX8-NEXT: s_bfe_i32 s9, s6, 0x40004		; GFX8-NEXT: s_bfe_i32 s8, s3, 0x40004
; GFX8-NEXT: s_bfe_i32 s11, s6, 0x40008		; GFX8-NEXT: s_bfe_i32 s10, s3, 0x40008
; GFX8-NEXT: s_lshr_b32 s1, s0, 12		; GFX8-NEXT: s_lshr_b32 s1, s0, 12
; GFX8-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX8-NEXT: s_bfe_i32 s5, s0, 0x40000
; GFX8-NEXT: v_mov_b32_e32 v6, s7		; GFX8-NEXT: v_mov_b32_e32 v6, s6
; GFX8-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX8-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX8-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX8-NEXT: v_lshlrev_b16_e64 v5, 12, s4
; GFX8-NEXT: s_bfe_i32 s8, s0, 0x40004		; GFX8-NEXT: s_bfe_i32 s7, s0, 0x40004
; GFX8-NEXT: s_bfe_i32 s10, s0, 0x40008		; GFX8-NEXT: s_bfe_i32 s9, s0, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v3, s11		; GFX8-NEXT: v_mov_b32_e32 v3, s10
; GFX8-NEXT: v_mov_b32_e32 v7, s9		; GFX8-NEXT: v_mov_b32_e32 v7, s8
; GFX8-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX8-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX8-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX8-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX8-NEXT: v_mul_i32_i24_e32 v3, s10, v3		; GFX8-NEXT: v_mul_i32_i24_e32 v3, s9, v3
; GFX8-NEXT: s_bfe_i32 s13, s6, 0x40010		; GFX8-NEXT: s_bfe_i32 s12, s3, 0x40010
; GFX8-NEXT: v_and_b32_e32 v4, s2, v4		; GFX8-NEXT: v_and_b32_e32 v4, s2, v4
; GFX8-NEXT: v_and_b32_e32 v5, s2, v5		; GFX8-NEXT: v_and_b32_e32 v5, s2, v5
; GFX8-NEXT: s_bfe_i32 s15, s6, 0x40014		; GFX8-NEXT: s_bfe_i32 s14, s3, 0x40014
; GFX8-NEXT: s_bfe_i32 s12, s0, 0x40010		; GFX8-NEXT: s_bfe_i32 s11, s0, 0x40010
; GFX8-NEXT: v_mov_b32_e32 v8, s13		; GFX8-NEXT: v_mov_b32_e32 v8, s12
; GFX8-NEXT: s_bfe_i32 s17, s6, 0x40018		; GFX8-NEXT: s_bfe_i32 s16, s3, 0x40018
; GFX8-NEXT: s_bfe_i32 s14, s0, 0x40014		; GFX8-NEXT: s_bfe_i32 s13, s0, 0x40014
; GFX8-NEXT: v_mov_b32_e32 v9, s15		; GFX8-NEXT: v_mov_b32_e32 v9, s14
; GFX8-NEXT: s_bfe_i32 s16, s0, 0x40018		; GFX8-NEXT: s_bfe_i32 s15, s0, 0x40018
; GFX8-NEXT: s_ashr_i32 s6, s6, 28		; GFX8-NEXT: s_ashr_i32 s3, s3, 28
; GFX8-NEXT: v_mov_b32_e32 v10, s17		; GFX8-NEXT: v_mov_b32_e32 v10, s16
; GFX8-NEXT: s_ashr_i32 s0, s0, 28		; GFX8-NEXT: s_ashr_i32 s0, s0, 28
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s5, v6, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s8, v7, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s7, v7, v2
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
; GFX8-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s12, v8, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s11, v8, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s14, v9, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s13, v9, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s16, v10, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s15, v10, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s6		; GFX8-NEXT: v_mov_b32_e32 v3, s3
; GFX8-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc8:		; GFX9-LABEL: idot8_acc8:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_movk_i32 s2, 0xff		; GFX9-NEXT: s_movk_i32 s2, 0xff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_i32 s7, s6, 0x40000		; GFX9-NEXT: s_bfe_i32 s6, s3, 0x40000
; GFX9-NEXT: s_lshr_b32 s4, s6, 12		; GFX9-NEXT: s_lshr_b32 s4, s3, 12
; GFX9-NEXT: s_bfe_i32 s9, s6, 0x40004		; GFX9-NEXT: s_bfe_i32 s8, s3, 0x40004
; GFX9-NEXT: s_bfe_i32 s11, s6, 0x40008		; GFX9-NEXT: s_bfe_i32 s10, s3, 0x40008
; GFX9-NEXT: s_lshr_b32 s1, s0, 12		; GFX9-NEXT: s_lshr_b32 s1, s0, 12
; GFX9-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX9-NEXT: s_bfe_i32 s5, s0, 0x40000
; GFX9-NEXT: v_mov_b32_e32 v6, s7		; GFX9-NEXT: v_mov_b32_e32 v6, s6
; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s4
; GFX9-NEXT: s_bfe_i32 s8, s0, 0x40004		; GFX9-NEXT: s_bfe_i32 s7, s0, 0x40004
; GFX9-NEXT: s_bfe_i32 s10, s0, 0x40008		; GFX9-NEXT: s_bfe_i32 s9, s0, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v3, s11		; GFX9-NEXT: v_mov_b32_e32 v3, s10
; GFX9-NEXT: v_mov_b32_e32 v7, s9		; GFX9-NEXT: v_mov_b32_e32 v7, s8
; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-NEXT: v_mul_i32_i24_e32 v3, s10, v3		; GFX9-NEXT: v_mul_i32_i24_e32 v3, s9, v3
; GFX9-NEXT: s_bfe_i32 s13, s6, 0x40010		; GFX9-NEXT: s_bfe_i32 s12, s3, 0x40010
; GFX9-NEXT: v_and_b32_e32 v4, s2, v4		; GFX9-NEXT: v_and_b32_e32 v4, s2, v4
; GFX9-NEXT: v_and_b32_e32 v5, s2, v5		; GFX9-NEXT: v_and_b32_e32 v5, s2, v5
; GFX9-NEXT: s_bfe_i32 s15, s6, 0x40014		; GFX9-NEXT: s_bfe_i32 s14, s3, 0x40014
; GFX9-NEXT: s_bfe_i32 s12, s0, 0x40010		; GFX9-NEXT: s_bfe_i32 s11, s0, 0x40010
; GFX9-NEXT: v_mov_b32_e32 v8, s13		; GFX9-NEXT: v_mov_b32_e32 v8, s12
; GFX9-NEXT: s_bfe_i32 s17, s6, 0x40018		; GFX9-NEXT: s_bfe_i32 s16, s3, 0x40018
; GFX9-NEXT: s_bfe_i32 s14, s0, 0x40014		; GFX9-NEXT: s_bfe_i32 s13, s0, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v9, s15		; GFX9-NEXT: v_mov_b32_e32 v9, s14
; GFX9-NEXT: s_bfe_i32 s16, s0, 0x40018		; GFX9-NEXT: s_bfe_i32 s15, s0, 0x40018
; GFX9-NEXT: s_ashr_i32 s6, s6, 28		; GFX9-NEXT: s_ashr_i32 s3, s3, 28
; GFX9-NEXT: v_mov_b32_e32 v10, s17		; GFX9-NEXT: v_mov_b32_e32 v10, s16
; GFX9-NEXT: s_ashr_i32 s0, s0, 28		; GFX9-NEXT: s_ashr_i32 s0, s0, 28
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s5, v6, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s8, v7, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s7, v7, v2
; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0		; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
; GFX9-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s12, v8, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s11, v8, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s14, v9, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s13, v9, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s16, v10, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s15, v10, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s6		; GFX9-NEXT: v_mov_b32_e32 v3, s3
; GFX9-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc8:		; GFX9-DL-LABEL: idot8_acc8:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_movk_i32 s2, 0xff		; GFX9-DL-NEXT: s_movk_i32 s2, 0xff
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_i32 s7, s6, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s6, s3, 0x40000
; GFX9-DL-NEXT: s_lshr_b32 s4, s6, 12		; GFX9-DL-NEXT: s_lshr_b32 s4, s3, 12
; GFX9-DL-NEXT: s_bfe_i32 s9, s6, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s8, s3, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s11, s6, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s10, s3, 0x40008
; GFX9-DL-NEXT: s_lshr_b32 s1, s0, 12		; GFX9-DL-NEXT: s_lshr_b32 s1, s0, 12
; GFX9-DL-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s5, s0, 0x40000
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s7		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s6
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s4
; GFX9-DL-NEXT: s_bfe_i32 s8, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s7, s0, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s10, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s9, s0, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s9		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s8
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-DL-NEXT: v_mul_i32_i24_e32 v3, s10, v3		; GFX9-DL-NEXT: v_mul_i32_i24_e32 v3, s9, v3
; GFX9-DL-NEXT: s_bfe_i32 s13, s6, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s12, s3, 0x40010
; GFX9-DL-NEXT: v_and_b32_e32 v4, s2, v4		; GFX9-DL-NEXT: v_and_b32_e32 v4, s2, v4
; GFX9-DL-NEXT: v_and_b32_e32 v5, s2, v5		; GFX9-DL-NEXT: v_and_b32_e32 v5, s2, v5
; GFX9-DL-NEXT: s_bfe_i32 s15, s6, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s14, s3, 0x40014
; GFX9-DL-NEXT: s_bfe_i32 s12, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s11, s0, 0x40010
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12
; GFX9-DL-NEXT: s_bfe_i32 s17, s6, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s16, s3, 0x40018
; GFX9-DL-NEXT: s_bfe_i32 s14, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s13, s0, 0x40014
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s14
; GFX9-DL-NEXT: s_bfe_i32 s16, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s15, s0, 0x40018
; GFX9-DL-NEXT: s_ashr_i32 s6, s6, 28		; GFX9-DL-NEXT: s_ashr_i32 s3, s3, 28
; GFX9-DL-NEXT: v_mov_b32_e32 v10, s17		; GFX9-DL-NEXT: v_mov_b32_e32 v10, s16
; GFX9-DL-NEXT: s_ashr_i32 s0, s0, 28		; GFX9-DL-NEXT: s_ashr_i32 s0, s0, 28
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s5, v6, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s8, v7, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s7, v7, v2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
; GFX9-DL-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s12, v8, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s11, v8, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s14, v9, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s13, v9, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s16, v10, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s15, v10, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s3
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc8:		; GFX10-DL-LABEL: idot8_acc8:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 12		; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 12
; GFX10-DL-NEXT: s_lshr_b32 s4, s1, 12		; GFX10-DL-NEXT: s_lshr_b32 s3, s1, 12
; GFX10-DL-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s2		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s2
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s4		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s3
; GFX10-DL-NEXT: s_bfe_i32 s8, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s7, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s9, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s8, s1, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s2, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s2, s1, 0x40004
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3
; GFX10-DL-NEXT: s_movk_i32 s4, 0xff		; GFX10-DL-NEXT: s_movk_i32 s3, 0xff
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4
; GFX10-DL-NEXT: v_mul_i32_i24_e64 v5, s8, s9		; GFX10-DL-NEXT: v_mul_i32_i24_e64 v5, s7, s8
; GFX10-DL-NEXT: v_and_b32_e32 v3, s4, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s3, v3
; GFX10-DL-NEXT: v_and_b32_e32 v4, s4, v4		; GFX10-DL-NEXT: v_and_b32_e32 v4, s3, v4
; GFX10-DL-NEXT: s_bfe_i32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_i32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_i32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s7, s2, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s6, s2, v2
; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40010
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
; GFX10-DL-NEXT: v_mad_u32_u24 v2, v3, v4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, v3, v4, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_i32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_ashr_i32 s0, s0, 28		; GFX10-DL-NEXT: s_ashr_i32 s0, s0, 28
; GFX10-DL-NEXT: s_ashr_i32 s1, s1, 28		; GFX10-DL-NEXT: s_ashr_i32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s0, s1, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i8 addrspace(1)* nocapture %dst) {		i8 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; Make sure the pattern is not recognized if there are multiple uses of the		; Make sure the pattern is not recognized if there are multiple uses of the
; intermediate multiplications.		; intermediate multiplications.
define amdgpu_kernel void @idot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_multiuses_mul1:		; GFX7-LABEL: idot8_multiuses_mul1:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s2, s0, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s7, s5, 0x40000
; GFX7-NEXT: v_mov_b32_e32 v0, s8		; GFX7-NEXT: v_mov_b32_e32 v0, s7
; GFX7-NEXT: v_mov_b32_e32 v1, s21		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: v_mad_i32_i24 v1, s2, v0, v1		; GFX7-NEXT: v_mad_i32_i24 v1, s6, v0, v1
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s9, s5, 0x40004
; GFX7-NEXT: s_bfe_i32 s9, s0, 0x40004		; GFX7-NEXT: s_bfe_i32 s8, s4, 0x40004
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008		; GFX7-NEXT: s_bfe_i32 s11, s5, 0x40008
; GFX7-NEXT: v_mad_i32_i24 v0, s2, v0, v1		; GFX7-NEXT: v_mad_i32_i24 v0, s6, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s10		; GFX7-NEXT: v_mov_b32_e32 v2, s9
; GFX7-NEXT: v_mad_i32_i24 v0, s9, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s8, v2, v0
; GFX7-NEXT: s_bfe_i32 s11, s0, 0x40008		; GFX7-NEXT: s_bfe_i32 s10, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v2, s12		; GFX7-NEXT: v_mov_b32_e32 v2, s11
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x4000c		; GFX7-NEXT: s_bfe_i32 s13, s5, 0x4000c
; GFX7-NEXT: v_mad_i32_i24 v0, s11, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s10, v2, v0
; GFX7-NEXT: s_bfe_i32 s13, s0, 0x4000c		; GFX7-NEXT: s_bfe_i32 s12, s4, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v2, s14		; GFX7-NEXT: v_mov_b32_e32 v2, s13
; GFX7-NEXT: s_bfe_i32 s16, s1, 0x40010		; GFX7-NEXT: s_bfe_i32 s15, s5, 0x40010
; GFX7-NEXT: v_mad_i32_i24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s12, v2, v0
; GFX7-NEXT: s_bfe_i32 s15, s0, 0x40010		; GFX7-NEXT: s_bfe_i32 s14, s4, 0x40010
; GFX7-NEXT: v_mov_b32_e32 v2, s16		; GFX7-NEXT: v_mov_b32_e32 v2, s15
; GFX7-NEXT: s_bfe_i32 s18, s1, 0x40014		; GFX7-NEXT: s_bfe_i32 s17, s5, 0x40014
; GFX7-NEXT: s_bfe_i32 s20, s1, 0x40018		; GFX7-NEXT: s_bfe_i32 s19, s5, 0x40018
; GFX7-NEXT: v_mad_i32_i24 v0, s15, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s14, v2, v0
; GFX7-NEXT: s_bfe_i32 s17, s0, 0x40014		; GFX7-NEXT: s_bfe_i32 s16, s4, 0x40014
; GFX7-NEXT: v_mov_b32_e32 v2, s18		; GFX7-NEXT: v_mov_b32_e32 v2, s17
; GFX7-NEXT: s_bfe_i32 s19, s0, 0x40018		; GFX7-NEXT: s_bfe_i32 s18, s4, 0x40018
; GFX7-NEXT: v_mad_i32_i24 v0, s17, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s16, v2, v0
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: s_ashr_i32 s1, s1, 28		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
; GFX7-NEXT: v_mad_i32_i24 v0, s19, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s18, v2, v0
; GFX7-NEXT: s_ashr_i32 s0, s0, 28		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
; GFX7-NEXT: v_mov_b32_e32 v2, s1		; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: v_mad_i32_i24 v0, s0, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s4, v2, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_multiuses_mul1:		; GFX8-LABEL: idot8_multiuses_mul1:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX8-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX8-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX8-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s5
; GFX8-NEXT: v_mov_b32_e32 v1, s19		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: v_mad_i32_i24 v1, s5, v0, v1		; GFX8-NEXT: v_mad_i32_i24 v1, s4, v0, v1
; GFX8-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX8-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX8-NEXT: s_bfe_i32 s7, s2, 0x40004		; GFX8-NEXT: s_bfe_i32 s6, s2, 0x40004
; GFX8-NEXT: s_bfe_i32 s10, s4, 0x40008		; GFX8-NEXT: s_bfe_i32 s9, s3, 0x40008
; GFX8-NEXT: v_mad_i32_i24 v0, s5, v0, v1		; GFX8-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s8		; GFX8-NEXT: v_mov_b32_e32 v2, s7
; GFX8-NEXT: v_mad_i32_i24 v0, s7, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s6, v2, v0
; GFX8-NEXT: s_bfe_i32 s9, s2, 0x40008		; GFX8-NEXT: s_bfe_i32 s8, s2, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v2, s10		; GFX8-NEXT: v_mov_b32_e32 v2, s9
; GFX8-NEXT: s_bfe_i32 s12, s4, 0x4000c		; GFX8-NEXT: s_bfe_i32 s11, s3, 0x4000c
; GFX8-NEXT: v_mad_i32_i24 v0, s9, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s8, v2, v0
; GFX8-NEXT: s_bfe_i32 s11, s2, 0x4000c		; GFX8-NEXT: s_bfe_i32 s10, s2, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v2, s12		; GFX8-NEXT: v_mov_b32_e32 v2, s11
; GFX8-NEXT: s_bfe_i32 s14, s4, 0x40010		; GFX8-NEXT: s_bfe_i32 s13, s3, 0x40010
; GFX8-NEXT: v_mad_i32_i24 v0, s11, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s10, v2, v0
; GFX8-NEXT: s_bfe_i32 s13, s2, 0x40010		; GFX8-NEXT: s_bfe_i32 s12, s2, 0x40010
; GFX8-NEXT: v_mov_b32_e32 v2, s14		; GFX8-NEXT: v_mov_b32_e32 v2, s13
; GFX8-NEXT: s_bfe_i32 s16, s4, 0x40014		; GFX8-NEXT: s_bfe_i32 s15, s3, 0x40014
; GFX8-NEXT: s_bfe_i32 s18, s4, 0x40018		; GFX8-NEXT: s_bfe_i32 s17, s3, 0x40018
; GFX8-NEXT: v_mad_i32_i24 v0, s13, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s12, v2, v0
; GFX8-NEXT: s_bfe_i32 s15, s2, 0x40014		; GFX8-NEXT: s_bfe_i32 s14, s2, 0x40014
; GFX8-NEXT: v_mov_b32_e32 v2, s16		; GFX8-NEXT: v_mov_b32_e32 v2, s15
; GFX8-NEXT: s_bfe_i32 s17, s2, 0x40018		; GFX8-NEXT: s_bfe_i32 s16, s2, 0x40018
; GFX8-NEXT: v_mad_i32_i24 v0, s15, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s14, v2, v0
; GFX8-NEXT: v_mov_b32_e32 v2, s18		; GFX8-NEXT: v_mov_b32_e32 v2, s17
; GFX8-NEXT: s_ashr_i32 s4, s4, 28		; GFX8-NEXT: s_ashr_i32 s3, s3, 28
; GFX8-NEXT: v_mad_i32_i24 v0, s17, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s16, v2, v0
; GFX8-NEXT: s_ashr_i32 s2, s2, 28		; GFX8-NEXT: s_ashr_i32 s2, s2, 28
; GFX8-NEXT: v_mov_b32_e32 v2, s4		; GFX8-NEXT: v_mov_b32_e32 v2, s3
; GFX8-NEXT: v_mad_i32_i24 v0, s2, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s2, v2, v0
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v0, v1		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_multiuses_mul1:		; GFX9-LABEL: idot8_multiuses_mul1:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX9-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX9-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX9-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s5
; GFX9-NEXT: v_mov_b32_e32 v1, s19		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: v_mad_i32_i24 v1, s5, v0, v1		; GFX9-NEXT: v_mad_i32_i24 v1, s4, v0, v1
; GFX9-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX9-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX9-NEXT: s_bfe_i32 s7, s2, 0x40004		; GFX9-NEXT: s_bfe_i32 s6, s2, 0x40004
; GFX9-NEXT: s_bfe_i32 s10, s4, 0x40008		; GFX9-NEXT: s_bfe_i32 s9, s3, 0x40008
; GFX9-NEXT: v_mad_i32_i24 v0, s5, v0, v1		; GFX9-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s8		; GFX9-NEXT: v_mov_b32_e32 v2, s7
; GFX9-NEXT: v_mad_i32_i24 v0, s7, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s6, v2, v0
; GFX9-NEXT: s_bfe_i32 s9, s2, 0x40008		; GFX9-NEXT: s_bfe_i32 s8, s2, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v2, s10		; GFX9-NEXT: v_mov_b32_e32 v2, s9
; GFX9-NEXT: s_bfe_i32 s12, s4, 0x4000c		; GFX9-NEXT: s_bfe_i32 s11, s3, 0x4000c
; GFX9-NEXT: v_mad_i32_i24 v0, s9, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s8, v2, v0
; GFX9-NEXT: s_bfe_i32 s11, s2, 0x4000c		; GFX9-NEXT: s_bfe_i32 s10, s2, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v2, s12		; GFX9-NEXT: v_mov_b32_e32 v2, s11
; GFX9-NEXT: s_bfe_i32 s14, s4, 0x40010		; GFX9-NEXT: s_bfe_i32 s13, s3, 0x40010
; GFX9-NEXT: v_mad_i32_i24 v0, s11, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s10, v2, v0
; GFX9-NEXT: s_bfe_i32 s13, s2, 0x40010		; GFX9-NEXT: s_bfe_i32 s12, s2, 0x40010
; GFX9-NEXT: v_mov_b32_e32 v2, s14		; GFX9-NEXT: v_mov_b32_e32 v2, s13
; GFX9-NEXT: s_bfe_i32 s16, s4, 0x40014		; GFX9-NEXT: s_bfe_i32 s15, s3, 0x40014
; GFX9-NEXT: s_bfe_i32 s18, s4, 0x40018		; GFX9-NEXT: s_bfe_i32 s17, s3, 0x40018
; GFX9-NEXT: v_mad_i32_i24 v0, s13, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s12, v2, v0
; GFX9-NEXT: s_bfe_i32 s15, s2, 0x40014		; GFX9-NEXT: s_bfe_i32 s14, s2, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v2, s16		; GFX9-NEXT: v_mov_b32_e32 v2, s15
; GFX9-NEXT: s_bfe_i32 s17, s2, 0x40018		; GFX9-NEXT: s_bfe_i32 s16, s2, 0x40018
; GFX9-NEXT: v_mad_i32_i24 v0, s15, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s14, v2, v0
; GFX9-NEXT: v_mov_b32_e32 v2, s18		; GFX9-NEXT: v_mov_b32_e32 v2, s17
; GFX9-NEXT: s_ashr_i32 s4, s4, 28		; GFX9-NEXT: s_ashr_i32 s3, s3, 28
; GFX9-NEXT: v_mad_i32_i24 v0, s17, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s16, v2, v0
; GFX9-NEXT: s_ashr_i32 s2, s2, 28		; GFX9-NEXT: s_ashr_i32 s2, s2, 28
; GFX9-NEXT: v_mov_b32_e32 v2, s4		; GFX9-NEXT: v_mov_b32_e32 v2, s3
; GFX9-NEXT: v_mad_i32_i24 v0, s2, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s2, v2, v0
; GFX9-NEXT: v_add_u32_e32 v2, v1, v0		; GFX9-NEXT: v_add_u32_e32 v2, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_multiuses_mul1:		; GFX9-DL-LABEL: idot8_multiuses_mul1:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX9-DL-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s5
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s19		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s18
; GFX9-DL-NEXT: v_mad_i32_i24 v1, s5, v0, v1		; GFX9-DL-NEXT: v_mad_i32_i24 v1, s4, v0, v1
; GFX9-DL-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s7, s2, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s6, s2, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s10, s4, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s9, s3, 0x40008
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s5, v0, v1		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s8		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s7
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s7, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s6, v2, v0
; GFX9-DL-NEXT: s_bfe_i32 s9, s2, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s8, s2, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s9
; GFX9-DL-NEXT: s_bfe_i32 s12, s4, 0x4000c		; GFX9-DL-NEXT: s_bfe_i32 s11, s3, 0x4000c
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s9, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s8, v2, v0
; GFX9-DL-NEXT: s_bfe_i32 s11, s2, 0x4000c		; GFX9-DL-NEXT: s_bfe_i32 s10, s2, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s11
; GFX9-DL-NEXT: s_bfe_i32 s14, s4, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s13, s3, 0x40010
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s11, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s10, v2, v0
; GFX9-DL-NEXT: s_bfe_i32 s13, s2, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s12, s2, 0x40010
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s13
; GFX9-DL-NEXT: s_bfe_i32 s16, s4, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s15, s3, 0x40014
; GFX9-DL-NEXT: s_bfe_i32 s18, s4, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s17, s3, 0x40018
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s13, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s12, v2, v0
; GFX9-DL-NEXT: s_bfe_i32 s15, s2, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s14, s2, 0x40014
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s15
; GFX9-DL-NEXT: s_bfe_i32 s17, s2, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s16, s2, 0x40018
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s15, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s14, v2, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s18		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s17
; GFX9-DL-NEXT: s_ashr_i32 s4, s4, 28		; GFX9-DL-NEXT: s_ashr_i32 s3, s3, 28
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s17, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s16, v2, v0
; GFX9-DL-NEXT: s_ashr_i32 s2, s2, 28		; GFX9-DL-NEXT: s_ashr_i32 s2, s2, 28
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s3
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s2, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s2, v2, v0
; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0		; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_multiuses_mul1:		; GFX10-DL-LABEL: idot8_multiuses_mul1:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX10-DL-NEXT: s_load_dword s5, s[0:1], 0x0		; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_bfe_i32 s6, s2, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s7, s4, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s6, s3, 0x40000
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s4, s2, 0x40004
; GFX10-DL-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s7, v0		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s5, s6, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s6, s7, v0		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s6, v0
; GFX10-DL-NEXT: s_bfe_i32 s6, s2, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s7, s4, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s6, s3, 0x40008
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s4, s7, v1
; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x4000c		; GFX10-DL-NEXT: s_bfe_i32 s4, s2, 0x4000c
; GFX10-DL-NEXT: s_bfe_i32 s8, s4, 0x4000c		; GFX10-DL-NEXT: s_bfe_i32 s7, s3, 0x4000c
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s6, s7, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s6, v1
; GFX10-DL-NEXT: s_bfe_i32 s6, s2, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40010
; GFX10-DL-NEXT: s_bfe_i32 s7, s4, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s6, s3, 0x40010
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s4, s7, v1
; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s4, s2, 0x40014
; GFX10-DL-NEXT: s_bfe_i32 s8, s4, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s7, s3, 0x40014
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s6, s7, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s6, v1
; GFX10-DL-NEXT: s_bfe_i32 s6, s2, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40018
; GFX10-DL-NEXT: s_bfe_i32 s7, s4, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s6, s3, 0x40018
; GFX10-DL-NEXT: s_ashr_i32 s2, s2, 28		; GFX10-DL-NEXT: s_ashr_i32 s2, s2, 28
; GFX10-DL-NEXT: s_ashr_i32 s4, s4, 28		; GFX10-DL-NEXT: s_ashr_i32 s3, s3, 28
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s4, s7, v1
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s6, s7, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s6, v1
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s2, s4, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s2, s3, v1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v0, v1		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v0, v1
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	entry:
store i32 %res, i32 addrspace(1)* %dst, align 4		store i32 %res, i32 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Support this pattern.		; TODO: Support this pattern.
define amdgpu_kernel void @idot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc32_vecMul:		; GFX7-LABEL: idot8_acc32_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s5, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s9, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s7, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_ashr_i64 s[10:11], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[8:9], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s11, s1, 4		; GFX7-NEXT: s_lshl_b32 s9, s5, 4
; GFX7-NEXT: s_ashr_i64 s[16:17], s[10:11], 60		; GFX7-NEXT: s_ashr_i64 s[14:15], s[8:9], 60
; GFX7-NEXT: s_lshl_b32 s11, s1, 16		; GFX7-NEXT: s_lshl_b32 s9, s5, 16
; GFX7-NEXT: s_ashr_i64 s[18:19], s[10:11], 60		; GFX7-NEXT: s_ashr_i64 s[16:17], s[8:9], 60
; GFX7-NEXT: s_lshl_b32 s11, s1, 20		; GFX7-NEXT: s_lshl_b32 s9, s5, 20
; GFX7-NEXT: s_lshl_b32 s13, s1, 8		; GFX7-NEXT: s_lshl_b32 s11, s5, 8
; GFX7-NEXT: s_lshl_b32 s15, s1, 12		; GFX7-NEXT: s_lshl_b32 s13, s5, 12
; GFX7-NEXT: s_ashr_i64 s[20:21], s[10:11], 60		; GFX7-NEXT: s_ashr_i64 s[18:19], s[8:9], 60
; GFX7-NEXT: s_lshl_b32 s11, s1, 24		; GFX7-NEXT: s_lshl_b32 s9, s5, 24
; GFX7-NEXT: s_lshl_b32 s1, s1, 28		; GFX7-NEXT: s_lshl_b32 s5, s5, 28
; GFX7-NEXT: s_ashr_i64 s[0:1], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[4:5], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 4		; GFX7-NEXT: s_lshl_b32 s5, s7, 4
; GFX7-NEXT: s_ashr_i64 s[26:27], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[24:25], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 8		; GFX7-NEXT: s_lshl_b32 s5, s7, 8
; GFX7-NEXT: s_ashr_i64 s[28:29], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[26:27], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 12		; GFX7-NEXT: s_lshl_b32 s5, s7, 12
; GFX7-NEXT: s_ashr_i64 s[30:31], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[28:29], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 16		; GFX7-NEXT: s_lshl_b32 s5, s7, 16
; GFX7-NEXT: s_ashr_i64 s[32:33], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[30:31], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 20		; GFX7-NEXT: s_lshl_b32 s5, s7, 20
; GFX7-NEXT: s_ashr_i64 s[34:35], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[32:33], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 24		; GFX7-NEXT: s_lshl_b32 s5, s7, 24
; GFX7-NEXT: s_ashr_i64 s[36:37], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[34:35], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 28		; GFX7-NEXT: s_lshl_b32 s5, s7, 28
; GFX7-NEXT: s_ashr_i64 s[24:25], s[8:9], 60		; GFX7-NEXT: s_ashr_i64 s[22:23], s[6:7], 60
; GFX7-NEXT: s_ashr_i64 s[8:9], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[6:7], s[4:5], 60
; GFX7-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s5, s[0:1], 0x0
; GFX7-NEXT: v_mov_b32_e32 v0, s8		; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: s_ashr_i64 s[22:23], s[10:11], 60		; GFX7-NEXT: s_ashr_i64 s[20:21], s[8:9], 60
; GFX7-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
; GFX7-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX7-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
		; GFX7-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mad_i32_i24 v0, s0, v0, v1		; GFX7-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s36
; GFX7-NEXT: v_mad_i32_i24 v0, s22, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s34		; GFX7-NEXT: v_mov_b32_e32 v1, s34
; GFX7-NEXT: v_mad_i32_i24 v0, s20, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s20, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s32		; GFX7-NEXT: v_mov_b32_e32 v1, s32
; GFX7-NEXT: v_mad_i32_i24 v0, s18, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s18, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s30		; GFX7-NEXT: v_mov_b32_e32 v1, s30
; GFX7-NEXT: v_mad_i32_i24 v0, s14, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s28		; GFX7-NEXT: v_mov_b32_e32 v1, s28
; GFX7-NEXT: v_mad_i32_i24 v0, s12, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s12, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s26		; GFX7-NEXT: v_mov_b32_e32 v1, s26
; GFX7-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s24
; GFX7-NEXT: v_mad_i32_i24 v0, s10, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s10, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: v_mov_b32_e32 v1, s24
		; GFX7-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX7-NEXT: v_mov_b32_e32 v1, s22
		; GFX7-NEXT: v_mad_i32_i24 v0, s8, v1, v0
		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc32_vecMul:		; GFX8-LABEL: idot8_acc32_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s5, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_ashr_i64 s[8:9], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[6:7], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s9, s5, 4		; GFX8-NEXT: s_lshl_b32 s7, s3, 4
; GFX8-NEXT: s_ashr_i64 s[16:17], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[14:15], s[6:7], 60
; GFX8-NEXT: s_lshl_b32 s9, s5, 20		; GFX8-NEXT: s_lshl_b32 s7, s3, 20
; GFX8-NEXT: s_lshl_b32 s11, s5, 8		; GFX8-NEXT: s_lshl_b32 s9, s3, 8
; GFX8-NEXT: s_lshl_b32 s13, s5, 12		; GFX8-NEXT: s_lshl_b32 s11, s3, 12
; GFX8-NEXT: s_lshl_b32 s15, s5, 16		; GFX8-NEXT: s_lshl_b32 s13, s3, 16
; GFX8-NEXT: s_ashr_i64 s[18:19], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[16:17], s[6:7], 60
; GFX8-NEXT: s_lshl_b32 s9, s5, 24		; GFX8-NEXT: s_lshl_b32 s7, s3, 24
; GFX8-NEXT: s_lshl_b32 s5, s5, 28		; GFX8-NEXT: s_lshl_b32 s3, s3, 28
; GFX8-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 4		; GFX8-NEXT: s_lshl_b32 s3, s5, 4
; GFX8-NEXT: s_ashr_i64 s[24:25], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[22:23], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 8		; GFX8-NEXT: s_lshl_b32 s3, s5, 8
; GFX8-NEXT: s_ashr_i64 s[26:27], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[24:25], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 12		; GFX8-NEXT: s_lshl_b32 s3, s5, 12
; GFX8-NEXT: s_ashr_i64 s[28:29], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[26:27], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 16		; GFX8-NEXT: s_lshl_b32 s3, s5, 16
; GFX8-NEXT: s_ashr_i64 s[30:31], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[28:29], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 20		; GFX8-NEXT: s_lshl_b32 s3, s5, 20
; GFX8-NEXT: s_ashr_i64 s[32:33], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[30:31], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 24		; GFX8-NEXT: s_lshl_b32 s3, s5, 24
; GFX8-NEXT: s_ashr_i64 s[34:35], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[32:33], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 28		; GFX8-NEXT: s_lshl_b32 s3, s5, 28
; GFX8-NEXT: s_ashr_i64 s[22:23], s[6:7], 60		; GFX8-NEXT: s_ashr_i64 s[20:21], s[4:5], 60
; GFX8-NEXT: s_ashr_i64 s[6:7], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[4:5], s[2:3], 60
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX8-NEXT: v_mov_b32_e32 v1, s2		; GFX8-NEXT: v_mov_b32_e32 v0, s4
; GFX8-NEXT: v_mad_i32_i24 v0, s4, v0, v1		; GFX8-NEXT: s_ashr_i64 s[18:19], s[6:7], 60
; GFX8-NEXT: s_ashr_i64 s[20:21], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s34		; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX8-NEXT: v_mad_i32_i24 v0, s20, v1, v0		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
		; GFX8-NEXT: v_mov_b32_e32 v1, s3
		; GFX8-NEXT: v_mad_i32_i24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s32		; GFX8-NEXT: v_mov_b32_e32 v1, s32
; GFX8-NEXT: v_mad_i32_i24 v0, s18, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s18, v1, v0
; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s30		; GFX8-NEXT: v_mov_b32_e32 v1, s30
; GFX8-NEXT: v_mad_i32_i24 v0, s14, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s28		; GFX8-NEXT: v_mov_b32_e32 v1, s28
; GFX8-NEXT: v_mad_i32_i24 v0, s12, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s12, v1, v0
; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s26		; GFX8-NEXT: v_mov_b32_e32 v1, s26
; GFX8-NEXT: v_mad_i32_i24 v0, s10, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX8-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s24		; GFX8-NEXT: v_mov_b32_e32 v1, s24
; GFX8-NEXT: v_mad_i32_i24 v0, s16, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s8, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s22		; GFX8-NEXT: v_mov_b32_e32 v1, s22
; GFX8-NEXT: v_mad_i32_i24 v2, s8, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX8-NEXT: v_mov_b32_e32 v1, s20
		; GFX8-NEXT: v_mad_i32_i24 v2, s6, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc32_vecMul:		; GFX9-LABEL: idot8_acc32_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s5, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_ashr_i64 s[8:9], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[6:7], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s9, s5, 4		; GFX9-NEXT: s_lshl_b32 s7, s3, 4
; GFX9-NEXT: s_ashr_i64 s[16:17], s[8:9], 60		; GFX9-NEXT: s_ashr_i64 s[14:15], s[6:7], 60
; GFX9-NEXT: s_lshl_b32 s9, s5, 20		; GFX9-NEXT: s_lshl_b32 s7, s3, 20
; GFX9-NEXT: s_lshl_b32 s11, s5, 8		; GFX9-NEXT: s_lshl_b32 s9, s3, 8
; GFX9-NEXT: s_lshl_b32 s13, s5, 12		; GFX9-NEXT: s_lshl_b32 s11, s3, 12
; GFX9-NEXT: s_lshl_b32 s15, s5, 16		; GFX9-NEXT: s_lshl_b32 s13, s3, 16
; GFX9-NEXT: s_ashr_i64 s[18:19], s[8:9], 60		; GFX9-NEXT: s_ashr_i64 s[16:17], s[6:7], 60
; GFX9-NEXT: s_lshl_b32 s9, s5, 24		; GFX9-NEXT: s_lshl_b32 s7, s3, 24
; GFX9-NEXT: s_lshl_b32 s5, s5, 28		; GFX9-NEXT: s_lshl_b32 s3, s3, 28
; GFX9-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 4		; GFX9-NEXT: s_lshl_b32 s3, s5, 4
; GFX9-NEXT: s_ashr_i64 s[24:25], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[22:23], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 8		; GFX9-NEXT: s_lshl_b32 s3, s5, 8
; GFX9-NEXT: s_ashr_i64 s[26:27], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[24:25], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 12		; GFX9-NEXT: s_lshl_b32 s3, s5, 12
; GFX9-NEXT: s_ashr_i64 s[28:29], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[26:27], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 16		; GFX9-NEXT: s_lshl_b32 s3, s5, 16
; GFX9-NEXT: s_ashr_i64 s[30:31], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[28:29], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 20		; GFX9-NEXT: s_lshl_b32 s3, s5, 20
; GFX9-NEXT: s_ashr_i64 s[32:33], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[30:31], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 24		; GFX9-NEXT: s_lshl_b32 s3, s5, 24
; GFX9-NEXT: s_ashr_i64 s[34:35], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[32:33], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 28		; GFX9-NEXT: s_lshl_b32 s3, s5, 28
; GFX9-NEXT: s_ashr_i64 s[22:23], s[6:7], 60		; GFX9-NEXT: s_ashr_i64 s[20:21], s[4:5], 60
; GFX9-NEXT: s_ashr_i64 s[6:7], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[4:5], s[2:3], 60
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-NEXT: v_mov_b32_e32 v1, s2		; GFX9-NEXT: v_mov_b32_e32 v0, s4
; GFX9-NEXT: v_mad_i32_i24 v0, s4, v0, v1		; GFX9-NEXT: s_ashr_i64 s[18:19], s[6:7], 60
; GFX9-NEXT: s_ashr_i64 s[20:21], s[8:9], 60		; GFX9-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s34		; GFX9-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX9-NEXT: v_mad_i32_i24 v0, s20, v1, v0		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-NEXT: v_mov_b32_e32 v1, s3
		; GFX9-NEXT: v_mad_i32_i24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s32		; GFX9-NEXT: v_mov_b32_e32 v1, s32
; GFX9-NEXT: v_mad_i32_i24 v0, s18, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s18, v1, v0
; GFX9-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s30		; GFX9-NEXT: v_mov_b32_e32 v1, s30
; GFX9-NEXT: v_mad_i32_i24 v0, s14, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX9-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s28		; GFX9-NEXT: v_mov_b32_e32 v1, s28
; GFX9-NEXT: v_mad_i32_i24 v0, s12, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s12, v1, v0
; GFX9-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s26		; GFX9-NEXT: v_mov_b32_e32 v1, s26
; GFX9-NEXT: v_mad_i32_i24 v0, s10, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX9-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s24		; GFX9-NEXT: v_mov_b32_e32 v1, s24
; GFX9-NEXT: v_mad_i32_i24 v0, s16, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s8, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s22		; GFX9-NEXT: v_mov_b32_e32 v1, s22
; GFX9-NEXT: v_mad_i32_i24 v2, s8, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX9-NEXT: v_mov_b32_e32 v1, s20
		; GFX9-NEXT: v_mad_i32_i24 v2, s6, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc32_vecMul:		; GFX9-DL-LABEL: idot8_acc32_vecMul:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s5, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_ashr_i64 s[8:9], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[6:7], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s9, s5, 4		; GFX9-DL-NEXT: s_lshl_b32 s7, s3, 4
; GFX9-DL-NEXT: s_ashr_i64 s[16:17], s[8:9], 60		; GFX9-DL-NEXT: s_ashr_i64 s[14:15], s[6:7], 60
; GFX9-DL-NEXT: s_lshl_b32 s9, s5, 20		; GFX9-DL-NEXT: s_lshl_b32 s7, s3, 20
; GFX9-DL-NEXT: s_lshl_b32 s11, s5, 8		; GFX9-DL-NEXT: s_lshl_b32 s9, s3, 8
; GFX9-DL-NEXT: s_lshl_b32 s13, s5, 12		; GFX9-DL-NEXT: s_lshl_b32 s11, s3, 12
; GFX9-DL-NEXT: s_lshl_b32 s15, s5, 16		; GFX9-DL-NEXT: s_lshl_b32 s13, s3, 16
; GFX9-DL-NEXT: s_ashr_i64 s[18:19], s[8:9], 60		; GFX9-DL-NEXT: s_ashr_i64 s[16:17], s[6:7], 60
; GFX9-DL-NEXT: s_lshl_b32 s9, s5, 24		; GFX9-DL-NEXT: s_lshl_b32 s7, s3, 24
; GFX9-DL-NEXT: s_lshl_b32 s5, s5, 28		; GFX9-DL-NEXT: s_lshl_b32 s3, s3, 28
; GFX9-DL-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 4		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 4
; GFX9-DL-NEXT: s_ashr_i64 s[24:25], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[22:23], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 8		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 8
; GFX9-DL-NEXT: s_ashr_i64 s[26:27], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[24:25], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 12		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 12
; GFX9-DL-NEXT: s_ashr_i64 s[28:29], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[26:27], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 16		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 16
; GFX9-DL-NEXT: s_ashr_i64 s[30:31], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[28:29], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 20		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 20
; GFX9-DL-NEXT: s_ashr_i64 s[32:33], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[30:31], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 24		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 24
; GFX9-DL-NEXT: s_ashr_i64 s[34:35], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[32:33], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 28		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 28
; GFX9-DL-NEXT: s_ashr_i64 s[22:23], s[6:7], 60		; GFX9-DL-NEXT: s_ashr_i64 s[20:21], s[4:5], 60
; GFX9-DL-NEXT: s_ashr_i64 s[6:7], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[4:5], s[2:3], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6		; GFX9-DL-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s2		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s4, v0, v1		; GFX9-DL-NEXT: s_ashr_i64 s[18:19], s[6:7], 60
; GFX9-DL-NEXT: s_ashr_i64 s[20:21], s[8:9], 60		; GFX9-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s34		; GFX9-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s20, v1, v0		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s3
		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s2, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s32		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s32
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s18, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s18, v1, v0
; GFX9-DL-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s30		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s30
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s14, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX9-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s28		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s28
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s12, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s12, v1, v0
; GFX9-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s26		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s26
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s10, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX9-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s24		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s24
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s16, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s8, v1, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s22		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s22
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s8, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s20
		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v1, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc32_vecMul:		; GFX10-DL-LABEL: idot8_acc32_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s5, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX10-DL-NEXT: s_load_dword s2, s[0:1], 0x0		; GFX10-DL-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
		; GFX10-DL-NEXT: s_lshl_b32 s7, s3, 28
; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 28		; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 28
; GFX10-DL-NEXT: s_lshl_b32 s11, s7, 28		; GFX10-DL-NEXT: s_lshl_b32 s11, s3, 24
; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 24		; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 24
; GFX10-DL-NEXT: s_lshl_b32 s15, s7, 24
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
		; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX10-DL-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX10-DL-NEXT: s_lshl_b32 s7, s3, 20
; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 20		; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 20
; GFX10-DL-NEXT: s_lshl_b32 s11, s7, 20		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s8, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s8, s10, v0		; GFX10-DL-NEXT: s_lshl_b32 s11, s3, 16
; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 16		; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 16
; GFX10-DL-NEXT: s_lshl_b32 s15, s7, 16		; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s10, s12, v0
; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s12, s14, v0
; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX10-DL-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX10-DL-NEXT: s_lshl_b32 s7, s3, 12
; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 12		; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 12
; GFX10-DL-NEXT: s_lshl_b32 s11, s7, 12		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s8, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s8, s10, v0		; GFX10-DL-NEXT: s_lshl_b32 s11, s3, 8
; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 8		; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 8
; GFX10-DL-NEXT: s_lshl_b32 s15, s7, 8		; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s10, s12, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s12, s14, v0		; GFX10-DL-NEXT: s_lshl_b32 s7, s3, 4
; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 4		; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 4
; GFX10-DL-NEXT: s_lshl_b32 s11, s7, 4		; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX10-DL-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s8, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s8, s10, v0		; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX10-DL-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX10-DL-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX10-DL-NEXT: s_ashr_i64 s[4:5], s[4:5], 60
; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s10, s12, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s12, s14, v0		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s8, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s8, s10, v0		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s6, v0
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
Show All 25 Lines	entry:
store i32 %add8, i32 addrspace(1)* %dst, align 4		store i32 %add8, i32 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Support this pattern.		; TODO: Support this pattern.
define amdgpu_kernel void @idot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc16_vecMul:		; GFX7-LABEL: idot8_acc16_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_mov_b32 s2, 0xffff		; GFX7-NEXT: s_mov_b32 s8, 0xffff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s0, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s16, s0, 0x40018		; GFX7-NEXT: s_bfe_i32 s15, s6, 0x40018
; GFX7-NEXT: s_bfe_i32 s17, s0, 0x40014		; GFX7-NEXT: s_bfe_i32 s16, s6, 0x40014
; GFX7-NEXT: s_bfe_i32 s18, s0, 0x40010		; GFX7-NEXT: s_bfe_i32 s17, s6, 0x40010
; GFX7-NEXT: s_bfe_i32 s19, s0, 0x40000		; GFX7-NEXT: s_bfe_i32 s18, s6, 0x40000
; GFX7-NEXT: s_bfe_i32 s20, s0, 0x40004		; GFX7-NEXT: s_bfe_i32 s19, s6, 0x40004
; GFX7-NEXT: s_bfe_i32 s21, s0, 0x40008		; GFX7-NEXT: s_bfe_i32 s20, s6, 0x40008
; GFX7-NEXT: s_ashr_i32 s15, s0, 28		; GFX7-NEXT: s_ashr_i32 s14, s6, 28
; GFX7-NEXT: s_bfe_i32 s0, s0, 0x4000c		; GFX7-NEXT: s_bfe_i32 s6, s6, 0x4000c
; GFX7-NEXT: s_ashr_i32 s8, s1, 28		; GFX7-NEXT: s_ashr_i32 s5, s4, 28
; GFX7-NEXT: s_bfe_i32 s9, s1, 0x40018		; GFX7-NEXT: s_bfe_i32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40014		; GFX7-NEXT: s_bfe_i32 s9, s4, 0x40014
; GFX7-NEXT: s_bfe_i32 s11, s1, 0x40010		; GFX7-NEXT: s_bfe_i32 s10, s4, 0x40010
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s11, s4, 0x40000
; GFX7-NEXT: v_mov_b32_e32 v4, s19		; GFX7-NEXT: v_mov_b32_e32 v4, s18
; GFX7-NEXT: s_bfe_i32 s13, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s12, s4, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v3, s20		; GFX7-NEXT: v_mov_b32_e32 v3, s19
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x40008		; GFX7-NEXT: s_bfe_i32 s13, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v2, s21		; GFX7-NEXT: v_mov_b32_e32 v2, s20
; GFX7-NEXT: s_bfe_i32 s1, s1, 0x4000c		; GFX7-NEXT: s_bfe_i32 s4, s4, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v1, s0		; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mul_i32_i24_e32 v1, s1, v1		; GFX7-NEXT: v_mul_i32_i24_e32 v1, s4, v1
; GFX7-NEXT: v_mul_i32_i24_e32 v2, s14, v2		; GFX7-NEXT: v_mul_i32_i24_e32 v2, s13, v2
; GFX7-NEXT: v_mul_i32_i24_e32 v3, s13, v3		; GFX7-NEXT: v_mul_i32_i24_e32 v3, s12, v3
; GFX7-NEXT: v_mul_i32_i24_e32 v4, s12, v4		; GFX7-NEXT: v_mul_i32_i24_e32 v4, s11, v4
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1		; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1
; GFX7-NEXT: v_and_b32_e32 v2, s2, v2		; GFX7-NEXT: v_and_b32_e32 v2, s8, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3		; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX7-NEXT: v_and_b32_e32 v4, s2, v4		; GFX7-NEXT: v_and_b32_e32 v4, s8, v4
; GFX7-NEXT: v_or_b32_e32 v1, v2, v1		; GFX7-NEXT: v_or_b32_e32 v1, v2, v1
; GFX7-NEXT: v_or_b32_e32 v2, v4, v3		; GFX7-NEXT: v_or_b32_e32 v2, v4, v3
; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 16		; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 16
; GFX7-NEXT: v_lshrrev_b32_e32 v4, 16, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v4, 16, v1
; GFX7-NEXT: v_mov_b32_e32 v5, s18		; GFX7-NEXT: v_mov_b32_e32 v5, s17
; GFX7-NEXT: v_mov_b32_e32 v6, s17		; GFX7-NEXT: v_mov_b32_e32 v6, s16
; GFX7-NEXT: v_mov_b32_e32 v7, s16		; GFX7-NEXT: v_mov_b32_e32 v7, s15
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0
; GFX7-NEXT: v_mad_i32_i24 v0, s11, v5, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s10, v5, v0
; GFX7-NEXT: v_mad_i32_i24 v0, s10, v6, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s9, v6, v0
; GFX7-NEXT: v_mad_i32_i24 v0, s9, v7, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s15		; GFX7-NEXT: v_mov_b32_e32 v1, s14
; GFX7-NEXT: v_mad_i32_i24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s5, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc16_vecMul:		; GFX8-LABEL: idot8_acc16_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ushort v2, v[0:1]		; GFX8-NEXT: flat_load_ushort v2, v[0:1]
; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshl_b32 s29, s7, 28		; GFX8-NEXT: s_lshl_b32 s27, s3, 28
; GFX8-NEXT: s_ashr_i64 s[18:19], s[6:7], 60		; GFX8-NEXT: s_ashr_i64 s[16:17], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s21, s7, 8		; GFX8-NEXT: s_lshl_b32 s19, s3, 8
; GFX8-NEXT: s_lshl_b32 s23, s7, 12		; GFX8-NEXT: s_lshl_b32 s21, s3, 12
; GFX8-NEXT: s_lshl_b32 s17, s1, 28		; GFX8-NEXT: s_lshl_b32 s15, s1, 28
; GFX8-NEXT: s_lshl_b32 s25, s7, 16		; GFX8-NEXT: s_lshl_b32 s23, s3, 16
; GFX8-NEXT: s_lshl_b32 s27, s7, 24		; GFX8-NEXT: s_lshl_b32 s25, s3, 24
; GFX8-NEXT: s_lshl_b32 s19, s7, 4		; GFX8-NEXT: s_lshl_b32 s17, s3, 4
; GFX8-NEXT: s_lshl_b32 s7, s7, 20		; GFX8-NEXT: s_lshl_b32 s3, s3, 20
; GFX8-NEXT: s_ashr_i64 s[4:5], s[0:1], 60		; GFX8-NEXT: s_ashr_i64 s[4:5], s[0:1], 60
; GFX8-NEXT: s_ashr_i64 s[28:29], s[28:29], 60		; GFX8-NEXT: s_ashr_i64 s[26:27], s[26:27], 60
; GFX8-NEXT: s_lshl_b32 s9, s1, 8		; GFX8-NEXT: s_lshl_b32 s7, s1, 8
; GFX8-NEXT: s_lshl_b32 s11, s1, 12		; GFX8-NEXT: s_lshl_b32 s9, s1, 12
; GFX8-NEXT: s_lshl_b32 s13, s1, 16		; GFX8-NEXT: s_lshl_b32 s11, s1, 16
; GFX8-NEXT: s_lshl_b32 s15, s1, 24		; GFX8-NEXT: s_lshl_b32 s13, s1, 24
; GFX8-NEXT: s_lshl_b32 s5, s1, 4		; GFX8-NEXT: s_lshl_b32 s5, s1, 4
; GFX8-NEXT: s_lshl_b32 s1, s1, 20		; GFX8-NEXT: s_lshl_b32 s1, s1, 20
; GFX8-NEXT: s_ashr_i64 s[26:27], s[26:27], 60		; GFX8-NEXT: s_ashr_i64 s[24:25], s[24:25], 60
; GFX8-NEXT: s_ashr_i64 s[6:7], s[6:7], 60		; GFX8-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX8-NEXT: s_ashr_i64 s[16:17], s[16:17], 60
; GFX8-NEXT: v_mov_b32_e32 v4, s28
; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
		; GFX8-NEXT: v_mov_b32_e32 v4, s26
		; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX8-NEXT: s_ashr_i64 s[0:1], s[0:1], 60		; GFX8-NEXT: s_ashr_i64 s[0:1], s[0:1], 60
; GFX8-NEXT: v_mov_b32_e32 v3, s6		; GFX8-NEXT: v_mov_b32_e32 v3, s2
; GFX8-NEXT: v_mov_b32_e32 v5, s26		; GFX8-NEXT: v_mov_b32_e32 v5, s24
; GFX8-NEXT: s_ashr_i64 s[24:25], s[24:25], 60
; GFX8-NEXT: v_mul_i32_i24_e32 v3, s0, v3
; GFX8-NEXT: s_ashr_i64 s[22:23], s[22:23], 60		; GFX8-NEXT: s_ashr_i64 s[22:23], s[22:23], 60
; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX8-NEXT: v_mul_i32_i24_e32 v3, s0, v3
; GFX8-NEXT: v_mov_b32_e32 v6, s24
; GFX8-NEXT: s_ashr_i64 s[20:21], s[20:21], 60		; GFX8-NEXT: s_ashr_i64 s[20:21], s[20:21], 60
; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX8-NEXT: v_mov_b32_e32 v7, s22		; GFX8-NEXT: v_mov_b32_e32 v6, s22
; GFX8-NEXT: s_ashr_i64 s[32:33], s[18:19], 60		; GFX8-NEXT: s_ashr_i64 s[18:19], s[18:19], 60
; GFX8-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX8-NEXT: v_mov_b32_e32 v8, s20		; GFX8-NEXT: v_mov_b32_e32 v7, s20
; GFX8-NEXT: s_ashr_i64 s[30:31], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[30:31], s[16:17], 60
; GFX8-NEXT: v_mov_b32_e32 v9, s32		; GFX8-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
		; GFX8-NEXT: v_mov_b32_e32 v8, s18
		; GFX8-NEXT: s_ashr_i64 s[28:29], s[4:5], 60
		; GFX8-NEXT: v_mov_b32_e32 v9, s30
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_i32_i24 v2, s16, v4, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s14, v4, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s14, v5, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s12, v5, v2
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX8-NEXT: v_mad_i32_i24 v2, s12, v6, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s10, v6, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s10, v7, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s8, v7, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s8, v8, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s6, v8, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s30, v9, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s28, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s18		; GFX8-NEXT: v_mov_b32_e32 v3, s16
; GFX8-NEXT: v_mad_i32_i24 v2, s4, v3, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s4, v3, v2
; GFX8-NEXT: flat_store_short v[0:1], v2		; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc16_vecMul:		; GFX9-LABEL: idot8_acc16_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s2, 0x40018
; GFX9-NEXT: s_lshr_b32 s5, s2, 28		; GFX9-NEXT: s_lshr_b32 s4, s2, 28
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c
; GFX9-NEXT: s_and_b32 s12, s2, 15		; GFX9-NEXT: s_and_b32 s11, s2, 15
; GFX9-NEXT: s_bfe_u32 s2, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s2, s2, 0x40004
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s12, s2		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s11, s2
; GFX9-NEXT: v_pk_lshlrev_b16 v0, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v0, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s10, s11		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s9, s10
; GFX9-NEXT: v_pk_lshlrev_b16 v1, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v1, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s8, s9		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s5, s8
; GFX9-NEXT: v_pk_lshlrev_b16 v2, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v2, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s4, s5		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s3, s4
; GFX9-NEXT: s_bfe_u32 s7, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s7, s6, 0x40018
; GFX9-NEXT: s_lshr_b32 s13, s6, 28		; GFX9-NEXT: s_lshr_b32 s12, s6, 28
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40008
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c
; GFX9-NEXT: s_and_b32 s18, s6, 15		; GFX9-NEXT: s_and_b32 s17, s6, 15
; GFX9-NEXT: s_bfe_u32 s6, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s6, s6, 0x40004
; GFX9-NEXT: v_pk_lshlrev_b16 v3, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v3, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s18, s6		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s17, s6
; GFX9-NEXT: v_pk_lshlrev_b16 v4, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v4, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s16, s17		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s15, s16
; GFX9-NEXT: v_pk_lshlrev_b16 v5, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v5, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s14, s15		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s13, s14
; GFX9-NEXT: v_pk_ashrrev_i16 v0, 12, v0 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v0, 12, v0 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v1, 12, v1 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v1, 12, v1 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v5, 12, v5 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v5, 12, v5 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_lshlrev_b16 v6, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v6, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_mul_lo_u16 v5, v1, v5		; GFX9-NEXT: v_pk_mul_lo_u16 v5, v1, v5
; GFX9-NEXT: v_pk_mul_lo_u16 v4, v0, v4		; GFX9-NEXT: v_pk_mul_lo_u16 v4, v0, v4
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_pk_ashrrev_i16 v2, 12, v2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v2, 12, v2 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v6, 12, v6 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v6, 12, v6 op_sel_hi:[0,1]
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: v_pk_mul_lo_u16 v2, v2, v6		; GFX9-NEXT: v_pk_mul_lo_u16 v2, v2, v6
; GFX9-NEXT: global_load_ushort v6, v[0:1], off		; GFX9-NEXT: global_load_ushort v6, v[0:1], off
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s7, s13		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s7, s12
; GFX9-NEXT: v_pk_lshlrev_b16 v7, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v7, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v7, 12, v7 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v7, 12, v7 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_mul_lo_u16 v3, v3, v7		; GFX9-NEXT: v_pk_mul_lo_u16 v3, v3, v7
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_add_u32_e32 v6, v4, v6		; GFX9-NEXT: v_add_u32_e32 v6, v4, v6
; GFX9-NEXT: v_add_u32_sdwa v4, v6, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v4, v6, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX9-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX9-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: v_add_u32_e32 v4, v4, v2		; GFX9-NEXT: v_add_u32_e32 v4, v4, v2
; GFX9-NEXT: v_add_u32_sdwa v2, v4, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v2, v4, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: global_store_short v[0:1], v2, off		; GFX9-NEXT: global_store_short v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc16_vecMul:		; GFX9-DL-LABEL: idot8_acc16_vecMul:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_u32 s4, s2, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s2, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s5, s2, 28		; GFX9-DL-NEXT: s_lshr_b32 s4, s2, 28
; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x4000c
; GFX9-DL-NEXT: s_and_b32 s12, s2, 15		; GFX9-DL-NEXT: s_and_b32 s11, s2, 15
; GFX9-DL-NEXT: s_bfe_u32 s2, s2, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s2, s2, 0x40004
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s12, s2		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s11, s2
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v0, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v0, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s10, s11		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s9, s10
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v1, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v1, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s8, s9		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s5, s8
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v2, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v2, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s4, s5		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s3, s4
; GFX9-DL-NEXT: s_bfe_u32 s7, s6, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s7, s6, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s13, s6, 28		; GFX9-DL-NEXT: s_lshr_b32 s12, s6, 28
; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x4000c
; GFX9-DL-NEXT: s_and_b32 s18, s6, 15		; GFX9-DL-NEXT: s_and_b32 s17, s6, 15
; GFX9-DL-NEXT: s_bfe_u32 s6, s6, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s6, s6, 0x40004
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v3, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v3, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s18, s6		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s17, s6
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v4, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v4, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s16, s17		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s15, s16
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v5, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v5, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s14, s15		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s13, s14
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v0, 12, v0 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v0, 12, v0 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v1, 12, v1 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v1, 12, v1 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v5 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v5 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v5, v1, v5		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v5, v1, v5
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v4, v0, v4		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v4, v0, v4
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v2, 12, v2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v2, 12, v2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v6, 12, v6 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v6, 12, v6 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v2, v2, v6		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v2, v2, v6
; GFX9-DL-NEXT: global_load_ushort v6, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v6, v[0:1], off
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s7, s13		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s7, s12
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v7, 12, v7 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v7, 12, v7 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v3, v3, v7		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v3, v3, v7
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_add_u32_e32 v6, v4, v6		; GFX9-DL-NEXT: v_add_u32_e32 v6, v4, v6
; GFX9-DL-NEXT: v_add_u32_sdwa v4, v6, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v4, v6, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX9-DL-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX9-DL-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_e32 v4, v4, v2		; GFX9-DL-NEXT: v_add_u32_e32 v4, v4, v2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v4, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v4, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: global_store_short v[0:1], v2, off		; GFX9-DL-NEXT: global_store_short v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc16_vecMul:		; GFX10-DL-LABEL: idot8_acc16_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s5, s0, 15		; GFX10-DL-NEXT: s_and_b32 s4, s0, 15
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
; GFX10-DL-NEXT: s_and_b32 s7, s1, 15		; GFX10-DL-NEXT: s_and_b32 s6, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s3, s0, 28
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s5, s6		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s4, s5
; GFX10-DL-NEXT: s_bfe_u32 s9, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s8, s0, 0x40010
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s7, s7, s8		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s6, s6, s7
; GFX10-DL-NEXT: s_bfe_u32 s10, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s9, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40008
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v3, 12, s5 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v3, 12, s4 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v4, 12, s7 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v4, 12, s6 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x4000c
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s6, s0		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s5, s0
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40010
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s8, s5		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s7, s4
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40018
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v5, 12, s0 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v5, 12, s0 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_bfe_u32 s0, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s0, s1, 0x40014
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, v3, v4		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, v3, v4
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s5 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s4 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s9, s10		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s8, s9
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v5 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v5 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s6, s0		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s5, s0
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v6 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v6 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s5 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s4 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s0 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s0 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s2, s4		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s2, s3
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, v4, v5		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, v4, v5
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s1, s7, s1		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s1, s6, s1
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v7 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v7 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s1 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s1 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v6 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v6 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s0 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s0 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
Show All 40 Lines	entry:
store i16 %add8, i16 addrspace(1)* %dst, align 4		store i16 %add8, i16 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Support this pattern.		; TODO: Support this pattern.
define amdgpu_kernel void @idot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc8_vecMul:		; GFX7-LABEL: idot8_acc8_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_movk_i32 s0, 0xff		; GFX7-NEXT: s_movk_i32 s8, 0xff
; GFX7-NEXT: s_mov_b32 s1, 0xffff		; GFX7-NEXT: s_mov_b32 s9, 0xffff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s2, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s8, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s16, s8, 0x40000		; GFX7-NEXT: s_bfe_i32 s15, s5, 0x40000
; GFX7-NEXT: s_bfe_i32 s17, s8, 0x40004		; GFX7-NEXT: s_bfe_i32 s16, s5, 0x40004
; GFX7-NEXT: s_bfe_i32 s18, s8, 0x40008		; GFX7-NEXT: s_bfe_i32 s17, s5, 0x40008
; GFX7-NEXT: s_bfe_i32 s19, s8, 0x4000c		; GFX7-NEXT: s_bfe_i32 s18, s5, 0x4000c
; GFX7-NEXT: s_bfe_i32 s20, s8, 0x40010		; GFX7-NEXT: s_bfe_i32 s19, s5, 0x40010
; GFX7-NEXT: s_bfe_i32 s21, s8, 0x40014		; GFX7-NEXT: s_bfe_i32 s20, s5, 0x40014
; GFX7-NEXT: s_bfe_i32 s22, s8, 0x40018		; GFX7-NEXT: s_bfe_i32 s21, s5, 0x40018
; GFX7-NEXT: s_ashr_i32 s8, s8, 28		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
; GFX7-NEXT: v_mov_b32_e32 v8, s16		; GFX7-NEXT: v_mov_b32_e32 v8, s15
; GFX7-NEXT: s_bfe_i32 s10, s2, 0x40004		; GFX7-NEXT: s_bfe_i32 s7, s4, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v7, s17		; GFX7-NEXT: v_mov_b32_e32 v7, s16
; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40008		; GFX7-NEXT: s_bfe_i32 s10, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v6, s18		; GFX7-NEXT: v_mov_b32_e32 v6, s17
; GFX7-NEXT: s_bfe_i32 s12, s2, 0x4000c		; GFX7-NEXT: s_bfe_i32 s11, s4, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v5, s19		; GFX7-NEXT: v_mov_b32_e32 v5, s18
; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40010		; GFX7-NEXT: s_bfe_i32 s12, s4, 0x40010
; GFX7-NEXT: v_mov_b32_e32 v4, s20		; GFX7-NEXT: v_mov_b32_e32 v4, s19
; GFX7-NEXT: s_bfe_i32 s14, s2, 0x40014		; GFX7-NEXT: s_bfe_i32 s13, s4, 0x40014
; GFX7-NEXT: v_mov_b32_e32 v3, s21		; GFX7-NEXT: v_mov_b32_e32 v3, s20
; GFX7-NEXT: s_bfe_i32 s15, s2, 0x40018		; GFX7-NEXT: s_bfe_i32 s14, s4, 0x40018
; GFX7-NEXT: v_mov_b32_e32 v2, s22		; GFX7-NEXT: v_mov_b32_e32 v2, s21
; GFX7-NEXT: s_ashr_i32 s2, s2, 28		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
; GFX7-NEXT: v_mov_b32_e32 v1, s8		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mul_i32_i24_e32 v1, s2, v1		; GFX7-NEXT: v_mul_i32_i24_e32 v1, s4, v1
; GFX7-NEXT: v_mul_i32_i24_e32 v2, s15, v2		; GFX7-NEXT: v_mul_i32_i24_e32 v2, s14, v2
; GFX7-NEXT: v_mul_i32_i24_e32 v3, s14, v3		; GFX7-NEXT: v_mul_i32_i24_e32 v3, s13, v3
; GFX7-NEXT: v_mul_i32_i24_e32 v9, s13, v4		; GFX7-NEXT: v_mul_i32_i24_e32 v9, s12, v4
; GFX7-NEXT: v_mul_i32_i24_e32 v5, s12, v5		; GFX7-NEXT: v_mul_i32_i24_e32 v5, s11, v5
; GFX7-NEXT: v_mul_i32_i24_e32 v6, s11, v6		; GFX7-NEXT: v_mul_i32_i24_e32 v6, s10, v6
; GFX7-NEXT: v_mul_i32_i24_e32 v7, s10, v7		; GFX7-NEXT: v_mul_i32_i24_e32 v7, s7, v7
; GFX7-NEXT: v_mul_i32_i24_e32 v8, s9, v8		; GFX7-NEXT: v_mul_i32_i24_e32 v8, s6, v8
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 8, v1		; GFX7-NEXT: v_lshlrev_b32_e32 v1, 8, v1
; GFX7-NEXT: v_and_b32_e32 v2, s0, v2		; GFX7-NEXT: v_and_b32_e32 v2, s8, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v3, 8, v3		; GFX7-NEXT: v_lshlrev_b32_e32 v3, 8, v3
; GFX7-NEXT: v_and_b32_e32 v9, s0, v9		; GFX7-NEXT: v_and_b32_e32 v9, s8, v9
; GFX7-NEXT: v_lshlrev_b32_e32 v5, 8, v5		; GFX7-NEXT: v_lshlrev_b32_e32 v5, 8, v5
; GFX7-NEXT: v_and_b32_e32 v6, s0, v6		; GFX7-NEXT: v_and_b32_e32 v6, s8, v6
; GFX7-NEXT: v_lshlrev_b32_e32 v7, 8, v7		; GFX7-NEXT: v_lshlrev_b32_e32 v7, 8, v7
; GFX7-NEXT: v_and_b32_e32 v8, s0, v8		; GFX7-NEXT: v_and_b32_e32 v8, s8, v8
; GFX7-NEXT: v_or_b32_e32 v1, v2, v1		; GFX7-NEXT: v_or_b32_e32 v1, v2, v1
; GFX7-NEXT: v_or_b32_e32 v2, v9, v3		; GFX7-NEXT: v_or_b32_e32 v2, v9, v3
; GFX7-NEXT: v_or_b32_e32 v3, v6, v5		; GFX7-NEXT: v_or_b32_e32 v3, v6, v5
; GFX7-NEXT: v_or_b32_e32 v5, v8, v7		; GFX7-NEXT: v_or_b32_e32 v5, v8, v7
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1		; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1
; GFX7-NEXT: v_and_b32_e32 v2, s1, v2		; GFX7-NEXT: v_and_b32_e32 v2, s9, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3		; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX7-NEXT: v_and_b32_e32 v5, s1, v5		; GFX7-NEXT: v_and_b32_e32 v5, s9, v5
; GFX7-NEXT: v_or_b32_e32 v1, v2, v1		; GFX7-NEXT: v_or_b32_e32 v1, v2, v1
; GFX7-NEXT: v_or_b32_e32 v2, v5, v3		; GFX7-NEXT: v_or_b32_e32 v2, v5, v3
; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 8		; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 8
; GFX7-NEXT: v_alignbit_b32 v5, v1, v2, 16		; GFX7-NEXT: v_alignbit_b32 v5, v1, v2, 16
; GFX7-NEXT: v_lshrrev_b32_e32 v6, 24, v2		; GFX7-NEXT: v_lshrrev_b32_e32 v6, 24, v2
; GFX7-NEXT: v_lshrrev_b32_e32 v7, 8, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v7, 8, v1
; GFX7-NEXT: v_lshrrev_b32_e32 v8, 16, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v8, 16, v1
; GFX7-NEXT: v_lshrrev_b32_e32 v1, 24, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v1, 24, v1
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v5, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v5, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v6, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v6, v0
; GFX7-NEXT: v_mad_i32_i24 v0, s13, v4, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s12, v4, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v7		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v7
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v8		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v8
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc8_vecMul:		; GFX8-LABEL: idot8_acc8_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_mov_b32 s2, 0xffff		; GFX8-NEXT: s_mov_b32 s32, 0xffff
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s5, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshl_b32 s13, s1, 24		; GFX8-NEXT: s_lshl_b32 s11, s1, 24
; GFX8-NEXT: s_lshl_b32 s17, s1, 16		; GFX8-NEXT: s_lshl_b32 s15, s1, 16
; GFX8-NEXT: s_ashr_i64 s[22:23], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[20:21], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s25, s5, 24		; GFX8-NEXT: s_lshl_b32 s23, s3, 24
; GFX8-NEXT: s_lshl_b32 s27, s5, 28		; GFX8-NEXT: s_lshl_b32 s25, s3, 28
; GFX8-NEXT: s_lshl_b32 s29, s5, 16		; GFX8-NEXT: s_lshl_b32 s27, s3, 16
; GFX8-NEXT: s_ashr_i64 s[10:11], s[0:1], 60		; GFX8-NEXT: s_ashr_i64 s[8:9], s[0:1], 60
; GFX8-NEXT: s_lshl_b32 s15, s1, 28		; GFX8-NEXT: s_lshl_b32 s13, s1, 28
; GFX8-NEXT: s_lshl_b32 s19, s5, 8		; GFX8-NEXT: s_lshl_b32 s17, s3, 8
; GFX8-NEXT: s_lshl_b32 s21, s5, 12		; GFX8-NEXT: s_lshl_b32 s19, s3, 12
; GFX8-NEXT: s_lshl_b32 s23, s5, 4		; GFX8-NEXT: s_lshl_b32 s21, s3, 4
; GFX8-NEXT: s_lshl_b32 s5, s5, 20		; GFX8-NEXT: s_lshl_b32 s3, s3, 20
; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX8-NEXT: s_ashr_i64 s[16:17], s[16:17], 60		; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
		; GFX8-NEXT: s_ashr_i64 s[22:23], s[22:23], 60
; GFX8-NEXT: s_ashr_i64 s[24:25], s[24:25], 60		; GFX8-NEXT: s_ashr_i64 s[24:25], s[24:25], 60
; GFX8-NEXT: s_ashr_i64 s[26:27], s[26:27], 60		; GFX8-NEXT: s_ashr_i64 s[26:27], s[26:27], 60
; GFX8-NEXT: s_ashr_i64 s[28:29], s[28:29], 60		; GFX8-NEXT: s_lshl_b32 s5, s1, 8
; GFX8-NEXT: s_lshl_b32 s7, s1, 8		; GFX8-NEXT: s_lshl_b32 s7, s1, 12
; GFX8-NEXT: s_lshl_b32 s9, s1, 12		; GFX8-NEXT: s_lshl_b32 s9, s1, 4
; GFX8-NEXT: s_lshl_b32 s11, s1, 4
; GFX8-NEXT: s_lshl_b32 s1, s1, 20		; GFX8-NEXT: s_lshl_b32 s1, s1, 20
; GFX8-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX8-NEXT: v_mov_b32_e32 v6, s28		; GFX8-NEXT: v_mov_b32_e32 v6, s26
; GFX8-NEXT: v_mov_b32_e32 v7, s16		; GFX8-NEXT: v_mov_b32_e32 v7, s14
; GFX8-NEXT: v_mov_b32_e32 v8, s26		; GFX8-NEXT: v_mov_b32_e32 v8, s24
; GFX8-NEXT: v_mov_b32_e32 v9, s24		; GFX8-NEXT: v_mov_b32_e32 v9, s22
; GFX8-NEXT: v_mov_b32_e32 v10, s12		; GFX8-NEXT: v_mov_b32_e32 v10, s10
; GFX8-NEXT: v_mul_i32_i24_sdwa v6, v7, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_i32_i24_sdwa v6, v7, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_mul_i32_i24_e32 v7, s14, v8		; GFX8-NEXT: v_mul_i32_i24_e32 v7, s12, v8
; GFX8-NEXT: v_mul_i32_i24_sdwa v8, v10, v9 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_i32_i24_sdwa v8, v10, v9 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: s_ashr_i64 s[0:1], s[0:1], 60		; GFX8-NEXT: s_ashr_i64 s[0:1], s[0:1], 60
; GFX8-NEXT: v_mov_b32_e32 v5, s4		; GFX8-NEXT: v_mov_b32_e32 v5, s2
; GFX8-NEXT: v_mul_i32_i24_e32 v5, s0, v5		; GFX8-NEXT: v_mul_i32_i24_e32 v5, s0, v5
; GFX8-NEXT: v_or_b32_sdwa v7, v7, v8 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v7, v7, v8 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX8-NEXT: s_ashr_i64 s[6:7], s[6:7], 60		; GFX8-NEXT: s_ashr_i64 s[4:5], s[4:5], 60
; GFX8-NEXT: s_ashr_i64 s[18:19], s[18:19], 60		; GFX8-NEXT: s_ashr_i64 s[16:17], s[16:17], 60
; GFX8-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX8-NEXT: v_and_b32_e32 v6, s2, v7		; GFX8-NEXT: v_and_b32_e32 v6, s32, v7
; GFX8-NEXT: s_ashr_i64 s[20:21], s[20:21], 60		; GFX8-NEXT: s_ashr_i64 s[18:19], s[18:19], 60
; GFX8-NEXT: v_mov_b32_e32 v3, s22		; GFX8-NEXT: v_mov_b32_e32 v3, s20
; GFX8-NEXT: v_mov_b32_e32 v4, s10		; GFX8-NEXT: v_mov_b32_e32 v4, s8
; GFX8-NEXT: s_ashr_i64 s[32:33], s[22:23], 60		; GFX8-NEXT: s_ashr_i64 s[30:31], s[20:21], 60
; GFX8-NEXT: v_mul_i32_i24_sdwa v3, v4, v3 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_i32_i24_sdwa v3, v4, v3 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_or_b32_e32 v5, v6, v5		; GFX8-NEXT: v_or_b32_e32 v5, v6, v5
; GFX8-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX8-NEXT: v_mov_b32_e32 v4, s20		; GFX8-NEXT: v_mov_b32_e32 v4, s18
; GFX8-NEXT: v_mov_b32_e32 v12, s18		; GFX8-NEXT: v_mov_b32_e32 v12, s16
; GFX8-NEXT: v_mov_b32_e32 v13, s6		; GFX8-NEXT: v_mov_b32_e32 v13, s4
; GFX8-NEXT: s_ashr_i64 s[30:31], s[10:11], 60		; GFX8-NEXT: s_ashr_i64 s[28:29], s[8:9], 60
; GFX8-NEXT: v_mov_b32_e32 v11, s32		; GFX8-NEXT: v_mov_b32_e32 v11, s30
; GFX8-NEXT: v_mul_i32_i24_e32 v4, s8, v4		; GFX8-NEXT: v_mul_i32_i24_e32 v4, s6, v4
; GFX8-NEXT: v_mul_i32_i24_sdwa v10, v13, v12 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_i32_i24_sdwa v10, v13, v12 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_lshrrev_b32_e32 v7, 8, v5		; GFX8-NEXT: v_lshrrev_b32_e32 v7, 8, v5
; GFX8-NEXT: v_or_b32_sdwa v4, v4, v10 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v4, v4, v10 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX8-NEXT: v_mul_i32_i24_e32 v9, s30, v11		; GFX8-NEXT: v_mul_i32_i24_e32 v9, s28, v11
; GFX8-NEXT: v_or_b32_sdwa v3, v9, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v3, v9, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX8-NEXT: v_and_b32_e32 v4, s2, v4		; GFX8-NEXT: v_and_b32_e32 v4, s32, v4
; GFX8-NEXT: v_or_b32_e32 v3, v4, v3		; GFX8-NEXT: v_or_b32_e32 v3, v4, v3
; GFX8-NEXT: v_lshrrev_b32_e32 v8, 8, v3		; GFX8-NEXT: v_lshrrev_b32_e32 v8, 8, v3
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v6		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v6
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v7, v2		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v7, v2
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v5, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_2 src1_sel:BYTE_0		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v5, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_2 src1_sel:BYTE_0
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v5, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_3 src1_sel:DWORD		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v5, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_3 src1_sel:DWORD
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v4		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v4
Show All 10 Lines
; GFX9-NEXT: s_mov_b32 s2, 0xffff		; GFX9-NEXT: s_mov_b32 s2, 0xffff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s8, s0, 4		; GFX9-NEXT: s_lshr_b32 s7, s0, 4
; GFX9-NEXT: s_lshr_b32 s15, s1, 4		; GFX9-NEXT: s_lshr_b32 s14, s1, 4
; GFX9-NEXT: v_lshlrev_b16_e64 v3, 12, s0		; GFX9-NEXT: v_lshlrev_b16_e64 v3, 12, s0
; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX9-NEXT: v_lshlrev_b16_e64 v7, 12, s8		; GFX9-NEXT: v_lshlrev_b16_e64 v7, 12, s7
; GFX9-NEXT: v_lshlrev_b16_e64 v14, 12, s15		; GFX9-NEXT: v_lshlrev_b16_e64 v14, 12, s14
; GFX9-NEXT: s_lshr_b32 s9, s0, 12		; GFX9-NEXT: s_lshr_b32 s8, s0, 12
; GFX9-NEXT: s_lshr_b32 s10, s0, 8		; GFX9-NEXT: s_lshr_b32 s9, s0, 8
; GFX9-NEXT: s_lshr_b32 s16, s1, 12		; GFX9-NEXT: s_lshr_b32 s15, s1, 12
; GFX9-NEXT: s_lshr_b32 s17, s1, 8		; GFX9-NEXT: s_lshr_b32 s16, s1, 8
; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s10		; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s9
; GFX9-NEXT: v_lshlrev_b16_e64 v6, 12, s9		; GFX9-NEXT: v_lshlrev_b16_e64 v6, 12, s8
; GFX9-NEXT: v_lshlrev_b16_e64 v12, 12, s17		; GFX9-NEXT: v_lshlrev_b16_e64 v12, 12, s16
; GFX9-NEXT: v_lshlrev_b16_e64 v13, 12, s16		; GFX9-NEXT: v_lshlrev_b16_e64 v13, 12, s15
; GFX9-NEXT: v_ashrrev_i16_e32 v3, 12, v3		; GFX9-NEXT: v_ashrrev_i16_e32 v3, 12, v3
; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-NEXT: v_ashrrev_i16_e32 v7, 12, v7		; GFX9-NEXT: v_ashrrev_i16_e32 v7, 12, v7
; GFX9-NEXT: v_ashrrev_i16_e32 v14, 12, v14		; GFX9-NEXT: v_ashrrev_i16_e32 v14, 12, v14
; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-NEXT: v_ashrrev_i16_e32 v12, 12, v12		; GFX9-NEXT: v_ashrrev_i16_e32 v12, 12, v12
; GFX9-NEXT: v_ashrrev_i16_e32 v6, 12, v6		; GFX9-NEXT: v_ashrrev_i16_e32 v6, 12, v6
; GFX9-NEXT: v_ashrrev_i16_e32 v13, 12, v13		; GFX9-NEXT: v_ashrrev_i16_e32 v13, 12, v13
; GFX9-NEXT: v_mul_lo_u16_e32 v3, v3, v4		; GFX9-NEXT: v_mul_lo_u16_e32 v3, v3, v4
; GFX9-NEXT: v_mul_lo_u16_sdwa v7, v7, v14 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v7, v7, v14 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_or_b32_sdwa v3, v3, v7 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX9-NEXT: v_or_b32_sdwa v3, v3, v7 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX9-NEXT: s_lshr_b32 s4, s0, 20		; GFX9-NEXT: s_lshr_b32 s3, s0, 20
; GFX9-NEXT: s_lshr_b32 s5, s0, 16		; GFX9-NEXT: s_lshr_b32 s4, s0, 16
; GFX9-NEXT: s_lshr_b32 s11, s1, 20		; GFX9-NEXT: s_lshr_b32 s10, s1, 20
; GFX9-NEXT: s_lshr_b32 s12, s1, 16		; GFX9-NEXT: s_lshr_b32 s11, s1, 16
; GFX9-NEXT: v_mul_lo_u16_sdwa v6, v6, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v6, v6, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_mul_lo_u16_e32 v5, v5, v12		; GFX9-NEXT: v_mul_lo_u16_e32 v5, v5, v12
; GFX9-NEXT: v_lshlrev_b16_e64 v10, 12, s5		; GFX9-NEXT: v_lshlrev_b16_e64 v10, 12, s4
; GFX9-NEXT: v_lshlrev_b16_e64 v11, 12, s4		; GFX9-NEXT: v_lshlrev_b16_e64 v11, 12, s3
; GFX9-NEXT: v_lshlrev_b16_e64 v17, 12, s12		; GFX9-NEXT: v_lshlrev_b16_e64 v17, 12, s11
; GFX9-NEXT: v_lshlrev_b16_e64 v18, 12, s11		; GFX9-NEXT: v_lshlrev_b16_e64 v18, 12, s10
; GFX9-NEXT: s_lshr_b32 s6, s0, 28		; GFX9-NEXT: s_lshr_b32 s5, s0, 28
; GFX9-NEXT: s_lshr_b32 s7, s0, 24		; GFX9-NEXT: s_lshr_b32 s6, s0, 24
; GFX9-NEXT: s_lshr_b32 s13, s1, 28		; GFX9-NEXT: s_lshr_b32 s12, s1, 28
; GFX9-NEXT: s_lshr_b32 s14, s1, 24		; GFX9-NEXT: s_lshr_b32 s13, s1, 24
; GFX9-NEXT: v_and_b32_e32 v3, s2, v3		; GFX9-NEXT: v_and_b32_e32 v3, s2, v3
; GFX9-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX9-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX9-NEXT: v_lshlrev_b16_e64 v8, 12, s7		; GFX9-NEXT: v_lshlrev_b16_e64 v8, 12, s6
; GFX9-NEXT: v_lshlrev_b16_e64 v9, 12, s6		; GFX9-NEXT: v_lshlrev_b16_e64 v9, 12, s5
; GFX9-NEXT: v_lshlrev_b16_e64 v15, 12, s14		; GFX9-NEXT: v_lshlrev_b16_e64 v15, 12, s13
; GFX9-NEXT: v_lshlrev_b16_e64 v16, 12, s13		; GFX9-NEXT: v_lshlrev_b16_e64 v16, 12, s12
; GFX9-NEXT: v_or_b32_e32 v5, v3, v5		; GFX9-NEXT: v_or_b32_e32 v5, v3, v5
; GFX9-NEXT: v_ashrrev_i16_e32 v10, 12, v10		; GFX9-NEXT: v_ashrrev_i16_e32 v10, 12, v10
; GFX9-NEXT: v_ashrrev_i16_e32 v17, 12, v17		; GFX9-NEXT: v_ashrrev_i16_e32 v17, 12, v17
; GFX9-NEXT: v_ashrrev_i16_e32 v11, 12, v11		; GFX9-NEXT: v_ashrrev_i16_e32 v11, 12, v11
; GFX9-NEXT: v_ashrrev_i16_e32 v18, 12, v18		; GFX9-NEXT: v_ashrrev_i16_e32 v18, 12, v18
; GFX9-NEXT: v_ashrrev_i16_e32 v8, 12, v8		; GFX9-NEXT: v_ashrrev_i16_e32 v8, 12, v8
; GFX9-NEXT: v_ashrrev_i16_e32 v15, 12, v15		; GFX9-NEXT: v_ashrrev_i16_e32 v15, 12, v15
; GFX9-NEXT: v_ashrrev_i16_e32 v9, 12, v9		; GFX9-NEXT: v_ashrrev_i16_e32 v9, 12, v9
Show All 27 Lines
; GFX9-DL-NEXT: s_mov_b32 s2, 0xffff		; GFX9-DL-NEXT: s_mov_b32 s2, 0xffff
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_lshr_b32 s8, s0, 4		; GFX9-DL-NEXT: s_lshr_b32 s7, s0, 4
; GFX9-DL-NEXT: s_lshr_b32 s15, s1, 4		; GFX9-DL-NEXT: s_lshr_b32 s14, s1, 4
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s8		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s7
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v14, 12, s15		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v14, 12, s14
; GFX9-DL-NEXT: s_lshr_b32 s9, s0, 12		; GFX9-DL-NEXT: s_lshr_b32 s8, s0, 12
; GFX9-DL-NEXT: s_lshr_b32 s10, s0, 8		; GFX9-DL-NEXT: s_lshr_b32 s9, s0, 8
; GFX9-DL-NEXT: s_lshr_b32 s16, s1, 12		; GFX9-DL-NEXT: s_lshr_b32 s15, s1, 12
; GFX9-DL-NEXT: s_lshr_b32 s17, s1, 8		; GFX9-DL-NEXT: s_lshr_b32 s16, s1, 8
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s10		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s9
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v6, 12, s9		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v6, 12, s8
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s17		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s16
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s16		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s15
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v3, 12, v3		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v3, 12, v3
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v7, 12, v7		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v7, 12, v7
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v14, 12, v14		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v14, 12, v14
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v12, 12, v12		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v12, 12, v12
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v6, 12, v6		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v6, 12, v6
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v13, 12, v13		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v13, 12, v13
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v3, v3, v4		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v3, v3, v4
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v7, v7, v14 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v7, v7, v14 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_or_b32_sdwa v3, v3, v7 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX9-DL-NEXT: v_or_b32_sdwa v3, v3, v7 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX9-DL-NEXT: s_lshr_b32 s4, s0, 20		; GFX9-DL-NEXT: s_lshr_b32 s3, s0, 20
; GFX9-DL-NEXT: s_lshr_b32 s5, s0, 16		; GFX9-DL-NEXT: s_lshr_b32 s4, s0, 16
; GFX9-DL-NEXT: s_lshr_b32 s11, s1, 20		; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 20
; GFX9-DL-NEXT: s_lshr_b32 s12, s1, 16		; GFX9-DL-NEXT: s_lshr_b32 s11, s1, 16
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v6, v6, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v6, v6, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v5, v5, v12		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v5, v5, v12
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v10, 12, s5		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v10, 12, s4
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v11, 12, s4		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v11, 12, s3
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v17, 12, s12		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v17, 12, s11
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v18, 12, s11		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v18, 12, s10
; GFX9-DL-NEXT: s_lshr_b32 s6, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s5, s0, 28
; GFX9-DL-NEXT: s_lshr_b32 s7, s0, 24		; GFX9-DL-NEXT: s_lshr_b32 s6, s0, 24
; GFX9-DL-NEXT: s_lshr_b32 s13, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s12, s1, 28
; GFX9-DL-NEXT: s_lshr_b32 s14, s1, 24		; GFX9-DL-NEXT: s_lshr_b32 s13, s1, 24
; GFX9-DL-NEXT: v_and_b32_e32 v3, s2, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, s2, v3
; GFX9-DL-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX9-DL-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v8, 12, s7		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v8, 12, s6
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v9, 12, s6		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v9, 12, s5
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v15, 12, s14		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v15, 12, s13
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v16, 12, s13		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v16, 12, s12
; GFX9-DL-NEXT: v_or_b32_e32 v5, v3, v5		; GFX9-DL-NEXT: v_or_b32_e32 v5, v3, v5
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v10, 12, v10		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v10, 12, v10
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v17, 12, v17		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v17, 12, v17
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v11, 12, v11		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v11, 12, v11
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v18, 12, v18		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v18, 12, v18
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v8, 12, v8		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v8, 12, v8
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v15, 12, v15		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v15, 12, v15
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v9, 12, v9		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v9, 12, v9
Show All 17 Lines
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc8_vecMul:		; GFX10-DL-LABEL: idot8_acc8_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
		; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_lshr_b32 s8, s0, 4		; GFX10-DL-NEXT: s_lshr_b32 s7, s0, 4
; GFX10-DL-NEXT: s_lshr_b32 s15, s1, 4		; GFX10-DL-NEXT: s_lshr_b32 s14, s1, 4
; GFX10-DL-NEXT: s_lshr_b32 s9, s0, 12		; GFX10-DL-NEXT: s_lshr_b32 s8, s0, 12
; GFX10-DL-NEXT: s_lshr_b32 s16, s1, 12		; GFX10-DL-NEXT: s_lshr_b32 s15, s1, 12
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s8		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s7
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s15		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s14
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v14, 12, s16		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v14, 12, s15
; GFX10-DL-NEXT: s_lshr_b32 s10, s0, 8		; GFX10-DL-NEXT: s_lshr_b32 s9, s0, 8
; GFX10-DL-NEXT: s_lshr_b32 s17, s1, 8		; GFX10-DL-NEXT: s_lshr_b32 s16, s1, 8
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v6, 12, s9		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v6, 12, s8
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v7, 12, v7		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v7, 12, v7
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v12		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v12
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s10		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s9
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s17		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s16
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v7, v7, v12		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v7, v7, v12
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v19, 12, v6		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v19, 12, v6
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v14, 12, v14		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v14, 12, v14
; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 20		; GFX10-DL-NEXT: s_lshr_b32 s3, s0, 20
; GFX10-DL-NEXT: s_lshr_b32 s5, s0, 16		; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 16
; GFX10-DL-NEXT: s_lshr_b32 s6, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s5, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s7, s0, 24		; GFX10-DL-NEXT: s_lshr_b32 s6, s0, 24
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, v3, v4		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, v3, v4
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, v19, v14		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, v19, v14
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v6, 8, v7		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v6, 8, v7
; GFX10-DL-NEXT: s_lshr_b32 s11, s1, 20		; GFX10-DL-NEXT: s_lshr_b32 s10, s1, 20
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v13		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v13
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 12, v5		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 12, v5
; GFX10-DL-NEXT: s_lshr_b32 s12, s1, 16		; GFX10-DL-NEXT: s_lshr_b32 s11, s1, 16
; GFX10-DL-NEXT: v_or_b32_sdwa v3, v3, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX10-DL-NEXT: v_or_b32_sdwa v3, v3, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX10-DL-NEXT: s_lshr_b32 s13, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s12, s1, 28
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v8, 12, s7		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v8, 12, s6
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v9, 12, s6		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v9, 12, s5
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v10, 12, s5		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v10, 12, s4
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v11, 12, s4		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v11, 12, s3
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s11		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s10
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, v5, v12		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, v5, v12
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 8, v4		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 8, v4
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s12		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s11
; GFX10-DL-NEXT: s_lshr_b32 s14, s1, 24		; GFX10-DL-NEXT: s_lshr_b32 s13, s1, 24
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v6, 12, v8		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v6, 12, v8
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v8, 12, v9		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v8, 12, v9
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v9, 12, v10		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v9, 12, v10
; GFX10-DL-NEXT: v_or_b32_sdwa v4, v5, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX10-DL-NEXT: v_or_b32_sdwa v4, v5, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX10-DL-NEXT: v_and_b32_e32 v3, s2, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s2, v3
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v16, 12, s13		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v16, 12, s12
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 12, v11		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 12, v11
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v10, 12, v13		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v10, 12, v13
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v15, 12, s14		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v15, 12, s13
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v7, 12, v7		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v7, 12, v7
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v11, 12, v16		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v11, 12, v16
; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v4		; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v4
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, v5, v10		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, v5, v10
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v15		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v15
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v10, v9, v7		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v10, v9, v7
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v8, v8, v11		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v8, v8, v11
; GFX10-DL-NEXT: v_lshrrev_b32_e32 v9, 8, v4		; GFX10-DL-NEXT: v_lshrrev_b32_e32 v9, 8, v4
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot8u.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s

define amdgpu_kernel void @udot8_acc32(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc32(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc32:		; GFX7-LABEL: udot8_acc32:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s11, s10, 28		; GFX7-NEXT: s_lshr_b32 s7, s6, 28
; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s6, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s6, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s6, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s6, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s6, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s6, 0x40004
; GFX7-NEXT: s_and_b32 s10, s10, 15		; GFX7-NEXT: s_and_b32 s6, s6, 15
; GFX7-NEXT: s_lshr_b32 s1, s0, 28		; GFX7-NEXT: s_lshr_b32 s5, s4, 28
; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s13, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v0, s10		; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s21
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s20		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: v_mad_u32_u24 v0, s14, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s19		; GFX7-NEXT: v_mov_b32_e32 v1, s19
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s13, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s18		; GFX7-NEXT: v_mov_b32_e32 v1, s18
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s17		; GFX7-NEXT: v_mov_b32_e32 v1, s17
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s16		; GFX7-NEXT: v_mov_b32_e32 v1, s16
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s15		; GFX7-NEXT: v_mov_b32_e32 v1, s15
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s11		; GFX7-NEXT: v_mov_b32_e32 v1, s14
; GFX7-NEXT: v_mad_u32_u24 v0, s1, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: v_mov_b32_e32 v1, s7
		; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc32:		; GFX8-LABEL: udot8_acc32:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s7, s6, 28		; GFX8-NEXT: s_lshr_b32 s7, s6, 28
; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX8-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX8-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX8-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX8-NEXT: s_and_b32 s6, s6, 15		; GFX8-NEXT: s_and_b32 s6, s6, 15
; GFX8-NEXT: s_lshr_b32 s4, s2, 28		; GFX8-NEXT: s_lshr_b32 s3, s2, 28
; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX8-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX8-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX8-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX8-NEXT: s_and_b32 s2, s2, 15		; GFX8-NEXT: s_and_b32 s2, s2, 15
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s6
; GFX8-NEXT: v_mov_b32_e32 v1, s19
; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s18		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s17		; GFX8-NEXT: v_mov_b32_e32 v1, s17
; GFX8-NEXT: v_mad_u32_u24 v0, s11, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s16		; GFX8-NEXT: v_mov_b32_e32 v1, s16
; GFX8-NEXT: v_mad_u32_u24 v0, s10, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s15		; GFX8-NEXT: v_mov_b32_e32 v1, s15
; GFX8-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s14		; GFX8-NEXT: v_mov_b32_e32 v1, s14
; GFX8-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s13		; GFX8-NEXT: v_mov_b32_e32 v1, s13
; GFX8-NEXT: v_mad_u32_u24 v0, s5, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX8-NEXT: v_mov_b32_e32 v1, s12
		; GFX8-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s7		; GFX8-NEXT: v_mov_b32_e32 v1, s7
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc32:		; GFX9-LABEL: udot8_acc32:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s7, s6, 28		; GFX9-NEXT: s_lshr_b32 s7, s6, 28
; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX9-NEXT: s_and_b32 s6, s6, 15		; GFX9-NEXT: s_and_b32 s6, s6, 15
; GFX9-NEXT: s_lshr_b32 s4, s2, 28		; GFX9-NEXT: s_lshr_b32 s3, s2, 28
; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX9-NEXT: s_and_b32 s2, s2, 15		; GFX9-NEXT: s_and_b32 s2, s2, 15
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s6
; GFX9-NEXT: v_mov_b32_e32 v1, s19
; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s18		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s17		; GFX9-NEXT: v_mov_b32_e32 v1, s17
; GFX9-NEXT: v_mad_u32_u24 v0, s11, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s16		; GFX9-NEXT: v_mov_b32_e32 v1, s16
; GFX9-NEXT: v_mad_u32_u24 v0, s10, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s15		; GFX9-NEXT: v_mov_b32_e32 v1, s15
; GFX9-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s14		; GFX9-NEXT: v_mov_b32_e32 v1, s14
; GFX9-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s13		; GFX9-NEXT: v_mov_b32_e32 v1, s13
; GFX9-NEXT: v_mad_u32_u24 v0, s5, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX9-NEXT: v_mov_b32_e32 v1, s12
		; GFX9-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s7		; GFX9-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc32:		; GFX9-DL-LABEL: udot8_acc32:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s6, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1		; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc32:		; GFX10-DL-LABEL: udot8_acc32:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX10-DL-NEXT: v_dot8_u32_u4 v2, s1, s2, v0		; GFX10-DL-NEXT: v_dot8_u32_u4 v2, s0, s1, v0
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s8		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s9		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Remove the unnecessary instruction(that is zero-extending the		; TODO: Remove the unnecessary instruction(that is zero-extending the
; 2nd MAD) to have the pattern-recognizer to kick in.		; 2nd MAD) to have the pattern-recognizer to kick in.
define amdgpu_kernel void @udot8_acc16(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc16(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc16:		; GFX7-LABEL: udot8_acc16:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc16:		; GFX8-LABEL: udot8_acc16:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ushort v2, v[0:1]		; GFX8-NEXT: flat_load_ushort v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX8-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_and_b32 s1, s1, 15		; GFX8-NEXT: s_and_b32 s1, s1, 15
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX8-NEXT: s_and_b32 s0, s0, 15		; GFX8-NEXT: s_and_b32 s0, s0, 15
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX8-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: flat_store_short v[0:1], v2		; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc16:		; GFX9-LABEL: udot8_acc16:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ushort v2, v[0:1], off		; GFX9-NEXT: global_load_ushort v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX9-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_and_b32 s1, s1, 15		; GFX9-NEXT: s_and_b32 s1, s1, 15
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-NEXT: s_and_b32 s0, s0, 15		; GFX9-NEXT: s_and_b32 s0, s0, 15
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: global_store_short v[0:1], v2, off		; GFX9-NEXT: global_store_short v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc16:		; GFX9-DL-LABEL: udot8_acc16:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_and_b32 s1, s1, 15		; GFX9-DL-NEXT: s_and_b32 s1, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-DL-NEXT: s_and_b32 s0, s0, 15		; GFX9-DL-NEXT: s_and_b32 s0, s0, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: global_store_short v[0:1], v2, off		; GFX9-DL-NEXT: global_store_short v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc16:		; GFX10-DL-LABEL: udot8_acc16:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40008
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c
; GFX10-DL-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: global_store_short v[0:1], v2, off		; GFX10-DL-NEXT: global_store_short v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i16 addrspace(1)* nocapture %dst) {		i16 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Remove the unnecessary instruction(that is zero-extending the		; TODO: Remove the unnecessary instruction(that is zero-extending the
; 2nd MAD) to have the pattern-recognizer to kick in.		; 2nd MAD) to have the pattern-recognizer to kick in.
define amdgpu_kernel void @udot8_acc8(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc8(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc8:		; GFX7-LABEL: udot8_acc8:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc8:		; GFX8-LABEL: udot8_acc8:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX8-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_and_b32 s1, s1, 15		; GFX8-NEXT: s_and_b32 s1, s1, 15
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX8-NEXT: s_and_b32 s0, s0, 15		; GFX8-NEXT: s_and_b32 s0, s0, 15
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_and_b32_e32 v2, 0xff, v2		; GFX8-NEXT: v_and_b32_e32 v2, 0xff, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc8:		; GFX9-LABEL: udot8_acc8:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX9-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_and_b32 s1, s1, 15		; GFX9-NEXT: s_and_b32 s1, s1, 15
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-NEXT: s_and_b32 s0, s0, 15		; GFX9-NEXT: s_and_b32 s0, s0, 15
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_and_b32_e32 v2, 0xff, v2		; GFX9-NEXT: v_and_b32_e32 v2, 0xff, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc8:		; GFX9-DL-LABEL: udot8_acc8:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_and_b32 s1, s1, 15		; GFX9-DL-NEXT: s_and_b32 s1, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-DL-NEXT: s_and_b32 s0, s0, 15		; GFX9-DL-NEXT: s_and_b32 s0, s0, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 0xff, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 0xff, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc8:		; GFX10-DL-LABEL: udot8_acc8:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40008
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c
; GFX10-DL-NEXT: v_and_b32_e32 v2, 0xff, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 0xff, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i8 addrspace(1)* nocapture %dst) {		i8 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Remove the two unnecessary instructions(and+add after 2nd MAD)		; TODO: Remove the two unnecessary instructions(and+add after 2nd MAD)
; to have the pattern-recognizer to kick in.		; to have the pattern-recognizer to kick in.
define amdgpu_kernel void @udot8_acc4(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc4(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc4:		; GFX7-LABEL: udot8_acc4:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_and_b32_e32 v0, 15, v0		; GFX7-NEXT: v_and_b32_e32 v0, 15, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc4:		; GFX8-LABEL: udot8_acc4:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_and_b32 s9, s0, 15		; GFX8-NEXT: s_and_b32 s8, s0, 15
; GFX8-NEXT: s_and_b32 s16, s1, 15		; GFX8-NEXT: s_and_b32 s15, s1, 15
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX8-NEXT: v_and_b32_e32 v3, 15, v3		; GFX8-NEXT: v_and_b32_e32 v3, 15, v3
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v2		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc4:		; GFX9-LABEL: udot8_acc4:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_and_b32 s9, s0, 15		; GFX9-NEXT: s_and_b32 s8, s0, 15
; GFX9-NEXT: s_and_b32 s16, s1, 15		; GFX9-NEXT: s_and_b32 s15, s1, 15
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc4:		; GFX9-DL-LABEL: udot8_acc4:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_and_b32 s9, s0, 15		; GFX9-DL-NEXT: s_and_b32 s8, s0, 15
; GFX9-DL-NEXT: s_and_b32 s16, s1, 15		; GFX9-DL-NEXT: s_and_b32 s15, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc4:		; GFX10-DL-LABEL: udot8_acc4:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40008
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x4000c
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s7, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s6, v2
; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s4, s5		; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s3, s4
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i4 addrspace(1)* nocapture %dst) {		i4 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Currently, permutation of udot8 is turned off due to a huge increase		; TODO: Currently, permutation of udot8 is turned off due to a huge increase
; in the compile time.		; in the compile time.
define amdgpu_kernel void @udot8_CommutationInsideMAD(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_CommutationInsideMAD(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_CommutationInsideMAD:		; GFX7-LABEL: udot8_CommutationInsideMAD:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_and_b32_e32 v0, 15, v0		; GFX7-NEXT: v_and_b32_e32 v0, 15, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_CommutationInsideMAD:		; GFX8-LABEL: udot8_CommutationInsideMAD:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_and_b32 s9, s0, 15		; GFX8-NEXT: s_and_b32 s8, s0, 15
; GFX8-NEXT: s_and_b32 s16, s1, 15		; GFX8-NEXT: s_and_b32 s15, s1, 15
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX8-NEXT: v_and_b32_e32 v3, 15, v3		; GFX8-NEXT: v_and_b32_e32 v3, 15, v3
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v3		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v3
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_CommutationInsideMAD:		; GFX9-LABEL: udot8_CommutationInsideMAD:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_and_b32 s9, s0, 15		; GFX9-NEXT: s_and_b32 s8, s0, 15
; GFX9-NEXT: s_and_b32 s16, s1, 15		; GFX9-NEXT: s_and_b32 s15, s1, 15
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: v_add_u32_e32 v2, v3, v2		; GFX9-NEXT: v_add_u32_e32 v2, v3, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_CommutationInsideMAD:		; GFX9-DL-LABEL: udot8_CommutationInsideMAD:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_and_b32 s9, s0, 15		; GFX9-DL-NEXT: s_and_b32 s8, s0, 15
; GFX9-DL-NEXT: s_and_b32 s16, s1, 15		; GFX9-DL-NEXT: s_and_b32 s15, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: v_add_u32_e32 v2, v3, v2		; GFX9-DL-NEXT: v_add_u32_e32 v2, v3, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_CommutationInsideMAD:		; GFX10-DL-LABEL: udot8_CommutationInsideMAD:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x4000c
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s4, s8		; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s3, s7
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s7, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s6, v2
; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i4 addrspace(1)* nocapture %dst) {		i4 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	entry:

store i4 %add8, i4 addrspace(1)* %dst, align 4		store i4 %add8, i4 addrspace(1)* %dst, align 4
ret void		ret void
}		}

define amdgpu_kernel void @udot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_multiuses_mul1:		; GFX7-LABEL: udot8_multiuses_mul1:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s6, 0x40004
; GFX7-NEXT: s_lshr_b32 s11, s10, 28		; GFX7-NEXT: s_lshr_b32 s7, s6, 28
; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s6, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s6, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s6, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s6, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s6, 0x40008
; GFX7-NEXT: s_and_b32 s10, s10, 15		; GFX7-NEXT: s_and_b32 s6, s6, 15
; GFX7-NEXT: s_lshr_b32 s1, s0, 28		; GFX7-NEXT: s_lshr_b32 s5, s4, 28
; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s13, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v0, s10		; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s21		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: v_mad_u32_u24 v1, s0, v0, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s4, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s20
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v0, v1
; GFX7-NEXT: v_mad_u32_u24 v1, s14, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s19		; GFX7-NEXT: v_mov_b32_e32 v2, s19
		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v0, v1
; GFX7-NEXT: v_mad_u32_u24 v1, s13, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s13, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s18		; GFX7-NEXT: v_mov_b32_e32 v2, s18
; GFX7-NEXT: v_mad_u32_u24 v1, s12, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s12, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s17		; GFX7-NEXT: v_mov_b32_e32 v2, s17
; GFX7-NEXT: v_mad_u32_u24 v1, s9, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s11, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s16		; GFX7-NEXT: v_mov_b32_e32 v2, s16
; GFX7-NEXT: v_mad_u32_u24 v1, s8, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s10, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s15		; GFX7-NEXT: v_mov_b32_e32 v2, s15
; GFX7-NEXT: v_mad_u32_u24 v1, s2, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s9, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s11		; GFX7-NEXT: v_mov_b32_e32 v2, s14
; GFX7-NEXT: v_mad_u32_u24 v1, s1, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s8, v2, v1
		; GFX7-NEXT: v_mov_b32_e32 v2, s7
		; GFX7-NEXT: v_mad_u32_u24 v1, s5, v2, v1
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v1, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_multiuses_mul1:		; GFX8-LABEL: udot8_multiuses_mul1:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX8-NEXT: s_lshr_b32 s7, s6, 28		; GFX8-NEXT: s_lshr_b32 s7, s6, 28
; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX8-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX8-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX8-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX8-NEXT: s_and_b32 s6, s6, 15		; GFX8-NEXT: s_and_b32 s6, s6, 15
; GFX8-NEXT: s_lshr_b32 s4, s2, 28		; GFX8-NEXT: s_lshr_b32 s3, s2, 28
; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX8-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX8-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX8-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX8-NEXT: s_and_b32 s2, s2, 15		; GFX8-NEXT: s_and_b32 s2, s2, 15
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s6
; GFX8-NEXT: v_mov_b32_e32 v1, s19		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: v_mad_u32_u24 v1, s2, v0, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s18
; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mad_u32_u24 v1, s12, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s17		; GFX8-NEXT: v_mov_b32_e32 v2, s17
		; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mad_u32_u24 v1, s11, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s11, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s16		; GFX8-NEXT: v_mov_b32_e32 v2, s16
; GFX8-NEXT: v_mad_u32_u24 v1, s10, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s10, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s15		; GFX8-NEXT: v_mov_b32_e32 v2, s15
; GFX8-NEXT: v_mad_u32_u24 v1, s9, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s9, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s14		; GFX8-NEXT: v_mov_b32_e32 v2, s14
; GFX8-NEXT: v_mad_u32_u24 v1, s8, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s8, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s13		; GFX8-NEXT: v_mov_b32_e32 v2, s13
; GFX8-NEXT: v_mad_u32_u24 v1, s5, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s5, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s7		; GFX8-NEXT: v_mov_b32_e32 v2, s12
; GFX8-NEXT: v_mad_u32_u24 v1, s4, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s4, v2, v1
		; GFX8-NEXT: v_mov_b32_e32 v2, s7
		; GFX8-NEXT: v_mad_u32_u24 v1, s3, v2, v1
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v1, v0		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_multiuses_mul1:		; GFX9-LABEL: udot8_multiuses_mul1:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX9-NEXT: s_lshr_b32 s7, s6, 28		; GFX9-NEXT: s_lshr_b32 s7, s6, 28
; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX9-NEXT: s_and_b32 s6, s6, 15		; GFX9-NEXT: s_and_b32 s6, s6, 15
; GFX9-NEXT: s_lshr_b32 s4, s2, 28		; GFX9-NEXT: s_lshr_b32 s3, s2, 28
; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX9-NEXT: s_and_b32 s2, s2, 15		; GFX9-NEXT: s_and_b32 s2, s2, 15
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s6
; GFX9-NEXT: v_mov_b32_e32 v1, s19		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: v_mad_u32_u24 v1, s2, v0, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s18
; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mad_u32_u24 v1, s12, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s17		; GFX9-NEXT: v_mov_b32_e32 v2, s17
		; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mad_u32_u24 v1, s11, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s11, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s16		; GFX9-NEXT: v_mov_b32_e32 v2, s16
; GFX9-NEXT: v_mad_u32_u24 v1, s10, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s10, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s15		; GFX9-NEXT: v_mov_b32_e32 v2, s15
; GFX9-NEXT: v_mad_u32_u24 v1, s9, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s9, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s14		; GFX9-NEXT: v_mov_b32_e32 v2, s14
; GFX9-NEXT: v_mad_u32_u24 v1, s8, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s8, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s13		; GFX9-NEXT: v_mov_b32_e32 v2, s13
; GFX9-NEXT: v_mad_u32_u24 v1, s5, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s5, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s7		; GFX9-NEXT: v_mov_b32_e32 v2, s12
; GFX9-NEXT: v_mad_u32_u24 v1, s4, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s4, v2, v1
		; GFX9-NEXT: v_mov_b32_e32 v2, s7
		; GFX9-NEXT: v_mad_u32_u24 v1, s3, v2, v1
; GFX9-NEXT: v_add_u32_e32 v2, v0, v1		; GFX9-NEXT: v_add_u32_e32 v2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_multiuses_mul1:		; GFX9-DL-LABEL: udot8_multiuses_mul1:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX9-DL-NEXT: s_lshr_b32 s7, s6, 28		; GFX9-DL-NEXT: s_lshr_b32 s7, s6, 28
; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX9-DL-NEXT: s_and_b32 s6, s6, 15		; GFX9-DL-NEXT: s_and_b32 s6, s6, 15
; GFX9-DL-NEXT: s_lshr_b32 s4, s2, 28		; GFX9-DL-NEXT: s_lshr_b32 s3, s2, 28
; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX9-DL-NEXT: s_and_b32 s2, s2, 15		; GFX9-DL-NEXT: s_and_b32 s2, s2, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s19		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s18
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s2, v0, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s2, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s18
; GFX9-DL-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s12, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s17		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s17
		; GFX9-DL-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s11, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s11, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s16
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s10, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s10, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s15
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s9, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s9, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s14
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s8, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s8, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s13
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s5, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s5, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s7		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s12
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s4, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s4, v2, v1
		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s7
		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s3, v2, v1
; GFX9-DL-NEXT: v_add_u32_e32 v2, v0, v1		; GFX9-DL-NEXT: v_add_u32_e32 v2, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_multiuses_mul1:		; GFX10-DL-LABEL: udot8_multiuses_mul1:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX10-DL-NEXT: s_load_dword s5, s[0:1], 0x0		; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s6, s2, 15		; GFX10-DL-NEXT: s_and_b32 s5, s2, 15
; GFX10-DL-NEXT: s_and_b32 s7, s4, 15		; GFX10-DL-NEXT: s_and_b32 s6, s3, 15
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: s_bfe_u32 s5, s2, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s2, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s8, s4, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s7, s3, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s9, s2, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s8, s2, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s10, s4, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s9, s3, 0x40008
; GFX10-DL-NEXT: v_mad_u32_u24 v0, s6, s7, v0		; GFX10-DL-NEXT: v_mad_u32_u24 v0, s5, s6, v0
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s5, s8, v0		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s4, s7, v0
; GFX10-DL-NEXT: s_bfe_u32 s5, s2, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s2, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s8, s4, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s7, s3, 0x4000c
; GFX10-DL-NEXT: v_mad_u32_u24 v0, s6, s7, v0		; GFX10-DL-NEXT: v_mad_u32_u24 v0, s5, s6, v0
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s9, s10, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s8, s9, v1
; GFX10-DL-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s10, s4, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s9, s3, 0x40010
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s4, s7, v1
; GFX10-DL-NEXT: s_bfe_u32 s5, s2, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s2, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s8, s4, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s7, s3, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s9, s10, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s8, s9, v1
; GFX10-DL-NEXT: s_bfe_u32 s9, s2, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s8, s2, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s10, s4, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s9, s3, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s2, s2, 28		; GFX10-DL-NEXT: s_lshr_b32 s2, s2, 28
; GFX10-DL-NEXT: s_lshr_b32 s4, s4, 28		; GFX10-DL-NEXT: s_lshr_b32 s3, s3, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s4, s7, v1
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s9, s10, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s8, s9, v1
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s2, s4, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s2, s3, v1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v0, v1		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v0, v1
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	entry:
%res = add i32 %add, %add8		%res = add i32 %add, %add8
store i32 %res, i32 addrspace(1)* %dst, align 4		store i32 %res, i32 addrspace(1)* %dst, align 4
ret void		ret void
}		}

define amdgpu_kernel void @udot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc32_vecMul:		; GFX7-LABEL: udot8_acc32_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s11, s10, 28		; GFX7-NEXT: s_lshr_b32 s7, s6, 28
; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s6, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s6, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s6, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s6, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s6, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s6, 0x40004
; GFX7-NEXT: s_and_b32 s10, s10, 15		; GFX7-NEXT: s_and_b32 s6, s6, 15
; GFX7-NEXT: s_lshr_b32 s1, s0, 28		; GFX7-NEXT: s_lshr_b32 s5, s4, 28
; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s13, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v0, s10		; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s21
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s20		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: v_mad_u32_u24 v0, s14, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s19		; GFX7-NEXT: v_mov_b32_e32 v1, s19
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s13, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s18		; GFX7-NEXT: v_mov_b32_e32 v1, s18
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s17		; GFX7-NEXT: v_mov_b32_e32 v1, s17
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s16		; GFX7-NEXT: v_mov_b32_e32 v1, s16
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s15		; GFX7-NEXT: v_mov_b32_e32 v1, s15
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s11		; GFX7-NEXT: v_mov_b32_e32 v1, s14
; GFX7-NEXT: v_mad_u32_u24 v0, s1, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: v_mov_b32_e32 v1, s7
		; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc32_vecMul:		; GFX8-LABEL: udot8_acc32_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s7, s6, 28		; GFX8-NEXT: s_lshr_b32 s7, s6, 28
; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX8-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX8-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX8-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX8-NEXT: s_and_b32 s6, s6, 15		; GFX8-NEXT: s_and_b32 s6, s6, 15
; GFX8-NEXT: s_lshr_b32 s4, s2, 28		; GFX8-NEXT: s_lshr_b32 s3, s2, 28
; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX8-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX8-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX8-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX8-NEXT: s_and_b32 s2, s2, 15		; GFX8-NEXT: s_and_b32 s2, s2, 15
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s6
; GFX8-NEXT: v_mov_b32_e32 v1, s19
; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s18		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s17		; GFX8-NEXT: v_mov_b32_e32 v1, s17
; GFX8-NEXT: v_mad_u32_u24 v0, s11, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s16		; GFX8-NEXT: v_mov_b32_e32 v1, s16
; GFX8-NEXT: v_mad_u32_u24 v0, s10, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s15		; GFX8-NEXT: v_mov_b32_e32 v1, s15
; GFX8-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s14		; GFX8-NEXT: v_mov_b32_e32 v1, s14
; GFX8-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s13		; GFX8-NEXT: v_mov_b32_e32 v1, s13
; GFX8-NEXT: v_mad_u32_u24 v0, s5, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX8-NEXT: v_mov_b32_e32 v1, s12
		; GFX8-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s7		; GFX8-NEXT: v_mov_b32_e32 v1, s7
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc32_vecMul:		; GFX9-LABEL: udot8_acc32_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s7, s6, 28		; GFX9-NEXT: s_lshr_b32 s7, s6, 28
; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX9-NEXT: s_and_b32 s6, s6, 15		; GFX9-NEXT: s_and_b32 s6, s6, 15
; GFX9-NEXT: s_lshr_b32 s4, s2, 28		; GFX9-NEXT: s_lshr_b32 s3, s2, 28
; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX9-NEXT: s_and_b32 s2, s2, 15		; GFX9-NEXT: s_and_b32 s2, s2, 15
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s6
; GFX9-NEXT: v_mov_b32_e32 v1, s19
; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s18		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s17		; GFX9-NEXT: v_mov_b32_e32 v1, s17
; GFX9-NEXT: v_mad_u32_u24 v0, s11, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s16		; GFX9-NEXT: v_mov_b32_e32 v1, s16
; GFX9-NEXT: v_mad_u32_u24 v0, s10, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s15		; GFX9-NEXT: v_mov_b32_e32 v1, s15
; GFX9-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s14		; GFX9-NEXT: v_mov_b32_e32 v1, s14
; GFX9-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s13		; GFX9-NEXT: v_mov_b32_e32 v1, s13
; GFX9-NEXT: v_mad_u32_u24 v0, s5, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX9-NEXT: v_mov_b32_e32 v1, s12
		; GFX9-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s7		; GFX9-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc32_vecMul:		; GFX9-DL-LABEL: udot8_acc32_vecMul:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s6, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1		; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc32_vecMul:		; GFX10-DL-LABEL: udot8_acc32_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX10-DL-NEXT: v_dot8_u32_u4 v2, s1, s2, v0		; GFX10-DL-NEXT: v_dot8_u32_u4 v2, s0, s1, v0
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s8		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s9		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2

Show All 24 Lines	entry:
ret void		ret void
}		}

; TODO: Clean up the code(by default pk_mad_I16 should be generated), then		; TODO: Clean up the code(by default pk_mad_I16 should be generated), then
; support the pattern.		; support the pattern.
define amdgpu_kernel void @udot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc16_vecMul:		; GFX7-LABEL: udot8_acc16_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x40004
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x40004
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_and_b32 s19, s1, 15		; GFX7-NEXT: s_and_b32 s18, s5, 15
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_bfe_u32 s1, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s5, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mul_u32_u24_e32 v2, s13, v2		; GFX7-NEXT: v_mul_u32_u24_e32 v2, s12, v2
; GFX7-NEXT: v_mul_u32_u24_e32 v4, s11, v4		; GFX7-NEXT: v_mul_u32_u24_e32 v4, s10, v4
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_and_b32 s12, s0, 15		; GFX7-NEXT: s_and_b32 s11, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: s_bfe_u32 s0, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s4, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mul_u32_u24_e32 v1, s0, v1		; GFX7-NEXT: v_mul_u32_u24_e32 v1, s4, v1
; GFX7-NEXT: v_lshlrev_b32_e32 v2, 16, v2		; GFX7-NEXT: v_lshlrev_b32_e32 v2, 16, v2
; GFX7-NEXT: v_mul_u32_u24_e32 v3, s12, v3		; GFX7-NEXT: v_mul_u32_u24_e32 v3, s11, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v4, 16, v4		; GFX7-NEXT: v_lshlrev_b32_e32 v4, 16, v4
; GFX7-NEXT: v_or_b32_e32 v1, v1, v2		; GFX7-NEXT: v_or_b32_e32 v1, v1, v2
; GFX7-NEXT: v_or_b32_e32 v2, v3, v4		; GFX7-NEXT: v_or_b32_e32 v2, v3, v4
; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 16		; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 16
; GFX7-NEXT: v_lshrrev_b32_e32 v4, 16, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v4, 16, v1
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc16_vecMul:		; GFX8-LABEL: udot8_acc16_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ushort v2, v[0:1]		; GFX8-NEXT: flat_load_ushort v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX8-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_and_b32 s1, s1, 15		; GFX8-NEXT: s_and_b32 s1, s1, 15
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX8-NEXT: s_and_b32 s0, s0, 15		; GFX8-NEXT: s_and_b32 s0, s0, 15
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX8-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: flat_store_short v[0:1], v2		; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc16_vecMul:		; GFX9-LABEL: udot8_acc16_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_u32 s7, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s7, s6, 0x40018
; GFX9-NEXT: s_lshr_b32 s13, s6, 28		; GFX9-NEXT: s_lshr_b32 s12, s6, 28
; GFX9-NEXT: s_pack_ll_b32_b16 s7, s7, s13		; GFX9-NEXT: s_pack_ll_b32_b16 s7, s7, s12
; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s2, 0x40018
; GFX9-NEXT: s_lshr_b32 s5, s2, 28		; GFX9-NEXT: s_lshr_b32 s4, s2, 28
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014
; GFX9-NEXT: s_pack_ll_b32_b16 s4, s4, s5		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s3, s4
; GFX9-NEXT: v_mov_b32_e32 v0, s7		; GFX9-NEXT: v_mov_b32_e32 v0, s7
; GFX9-NEXT: v_pk_mul_lo_u16 v2, s4, v0		; GFX9-NEXT: v_pk_mul_lo_u16 v2, s3, v0
; GFX9-NEXT: s_pack_ll_b32_b16 s4, s14, s15		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s13, s14
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40008
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c
; GFX9-NEXT: s_and_b32 s18, s6, 15		; GFX9-NEXT: s_and_b32 s17, s6, 15
; GFX9-NEXT: v_mov_b32_e32 v0, s4		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: s_pack_ll_b32_b16 s5, s8, s9		; GFX9-NEXT: s_pack_ll_b32_b16 s4, s5, s8
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c
; GFX9-NEXT: s_bfe_u32 s6, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s6, s6, 0x40004
; GFX9-NEXT: s_pack_ll_b32_b16 s4, s16, s17		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s15, s16
; GFX9-NEXT: v_pk_mul_lo_u16 v3, s5, v0		; GFX9-NEXT: v_pk_mul_lo_u16 v3, s4, v0
; GFX9-NEXT: s_and_b32 s12, s2, 15		; GFX9-NEXT: s_and_b32 s11, s2, 15
; GFX9-NEXT: s_bfe_u32 s2, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s2, s2, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v0, s4		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: s_pack_ll_b32_b16 s5, s10, s11		; GFX9-NEXT: s_pack_ll_b32_b16 s4, s9, s10
; GFX9-NEXT: s_pack_ll_b32_b16 s4, s18, s6		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s17, s6
; GFX9-NEXT: v_pk_mul_lo_u16 v4, s5, v0		; GFX9-NEXT: v_pk_mul_lo_u16 v4, s4, v0
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s12, s2		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s11, s2
; GFX9-NEXT: v_mov_b32_e32 v0, s4		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: v_pk_mul_lo_u16 v5, s2, v0		; GFX9-NEXT: v_pk_mul_lo_u16 v5, s2, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ushort v6, v[0:1], off		; GFX9-NEXT: global_load_ushort v6, v[0:1], off
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_add_u32_e32 v6, v5, v6		; GFX9-NEXT: v_add_u32_e32 v6, v5, v6
; GFX9-NEXT: v_add_u32_sdwa v5, v6, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v5, v6, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: v_add_u32_sdwa v5, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0		; GFX9-NEXT: v_add_u32_sdwa v5, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0
Show All 9 Lines
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_u32 s7, s6, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s7, s6, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s13, s6, 28		; GFX9-DL-NEXT: s_lshr_b32 s12, s6, 28
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s7, s7, s13		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s7, s7, s12
; GFX9-DL-NEXT: s_bfe_u32 s4, s2, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s2, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s5, s2, 28		; GFX9-DL-NEXT: s_lshr_b32 s4, s2, 28
; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40014
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s4, s5		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s3, s3, s4
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s7		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s7
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v2, s4, v0		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v2, s3, v0
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s14, s15		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s3, s13, s14
; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x4000c
; GFX9-DL-NEXT: s_and_b32 s18, s6, 15		; GFX9-DL-NEXT: s_and_b32 s17, s6, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s3
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s5, s8, s9		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s5, s8
; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s6, s6, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s6, s6, 0x40004
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s16, s17		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s3, s15, s16
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v3, s5, v0		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v3, s4, v0
; GFX9-DL-NEXT: s_and_b32 s12, s2, 15		; GFX9-DL-NEXT: s_and_b32 s11, s2, 15
; GFX9-DL-NEXT: s_bfe_u32 s2, s2, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s2, s2, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s3
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s5, s10, s11		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s9, s10
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s18, s6		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s3, s17, s6
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v4, s5, v0		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v4, s4, v0
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s12, s2		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s11, s2
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s3
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v5, s2, v0		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v5, s2, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ushort v6, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v6, v[0:1], off
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_add_u32_e32 v6, v5, v6		; GFX9-DL-NEXT: v_add_u32_e32 v6, v5, v6
; GFX9-DL-NEXT: v_add_u32_sdwa v5, v6, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v5, v6, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_sdwa v5, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0		; GFX9-DL-NEXT: v_add_u32_sdwa v5, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0
; GFX9-DL-NEXT: v_add_u32_sdwa v4, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v4, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_e32 v4, v4, v3		; GFX9-DL-NEXT: v_add_u32_e32 v4, v4, v3
; GFX9-DL-NEXT: v_add_u32_sdwa v3, v4, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v3, v4, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_e32 v3, v3, v2		; GFX9-DL-NEXT: v_add_u32_e32 v3, v3, v2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: global_store_short v[0:1], v2, off		; GFX9-DL-NEXT: global_store_short v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc16_vecMul:		; GFX10-DL-LABEL: udot8_acc16_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s8, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s7, s0, 0x4000c
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s6		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s5
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40008
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s4, s5		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s3, s3, s4
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40008
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, s2, s4		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, s2, s3
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s6, s7		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s3, s5, s6
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s5, s8		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s4, s7
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40014
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, s5, s4		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, s4, s3
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s6		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s5
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s7, s8		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s6, s7
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s4, s0		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s3, s0
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s1, s6, s1		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s1, s5, s1
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, s2, s5		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, s2, s4
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, s0, s1		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, s0, s1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v4		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v4
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: global_store_short v[0:1], v2, off		; GFX10-DL-NEXT: global_store_short v[0:1], v2, off
Show All 30 Lines	entry:
store i16 %add8, i16 addrspace(1)* %dst, align 4		store i16 %add8, i16 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Cleanup the code to generate MAD; pattern should be recognized then.		; TODO: Cleanup the code to generate MAD; pattern should be recognized then.
define amdgpu_kernel void @udot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc8_vecMul:		; GFX7-LABEL: udot8_acc8_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_u32 s2, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s6, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s13, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s18, s1, 28		; GFX7-NEXT: s_lshr_b32 s17, s5, 28
; GFX7-NEXT: v_mov_b32_e32 v8, s14		; GFX7-NEXT: v_mov_b32_e32 v8, s13
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40008
; GFX7-NEXT: s_and_b32 s17, s1, 15		; GFX7-NEXT: s_and_b32 s16, s5, 15
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: s_lshr_b32 s11, s0, 28		; GFX7-NEXT: s_lshr_b32 s10, s4, 28
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mul_u32_u24_e32 v4, s11, v4		; GFX7-NEXT: v_mul_u32_u24_e32 v4, s10, v4
; GFX7-NEXT: v_mul_u32_u24_e32 v6, s9, v6		; GFX7-NEXT: v_mul_u32_u24_e32 v6, s8, v6
; GFX7-NEXT: v_mul_u32_u24_e32 v8, s2, v8		; GFX7-NEXT: v_mul_u32_u24_e32 v8, s6, v8
; GFX7-NEXT: s_bfe_u32 s1, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s5, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_and_b32 s10, s0, 15		; GFX7-NEXT: s_and_b32 s9, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40018
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40014
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mul_u32_u24_e32 v2, s13, v2		; GFX7-NEXT: v_mul_u32_u24_e32 v2, s12, v2
; GFX7-NEXT: s_bfe_u32 s0, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s4, s4, 0x40010
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mul_u32_u24_e32 v3, s12, v3		; GFX7-NEXT: v_mul_u32_u24_e32 v3, s11, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v4, 8, v4		; GFX7-NEXT: v_lshlrev_b32_e32 v4, 8, v4
; GFX7-NEXT: v_mul_u32_u24_e32 v5, s10, v5		; GFX7-NEXT: v_mul_u32_u24_e32 v5, s9, v5
; GFX7-NEXT: v_mul_u32_u24_e32 v7, s8, v7		; GFX7-NEXT: v_mul_u32_u24_e32 v7, s7, v7
; GFX7-NEXT: v_lshlrev_b32_e32 v6, 8, v6		; GFX7-NEXT: v_lshlrev_b32_e32 v6, 8, v6
; GFX7-NEXT: v_lshlrev_b32_e32 v8, 8, v8		; GFX7-NEXT: v_lshlrev_b32_e32 v8, 8, v8
; GFX7-NEXT: v_or_b32_e32 v3, v3, v4		; GFX7-NEXT: v_or_b32_e32 v3, v3, v4
; GFX7-NEXT: v_or_b32_e32 v4, v5, v6		; GFX7-NEXT: v_or_b32_e32 v4, v5, v6
; GFX7-NEXT: v_or_b32_e32 v5, v7, v8		; GFX7-NEXT: v_or_b32_e32 v5, v7, v8
; GFX7-NEXT: v_mul_u32_u24_e32 v9, s0, v1		; GFX7-NEXT: v_mul_u32_u24_e32 v9, s4, v1
; GFX7-NEXT: v_lshlrev_b32_e32 v2, 8, v2		; GFX7-NEXT: v_lshlrev_b32_e32 v2, 8, v2
; GFX7-NEXT: v_or_b32_e32 v2, v9, v2		; GFX7-NEXT: v_or_b32_e32 v2, v9, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3		; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v5, 16, v5		; GFX7-NEXT: v_lshlrev_b32_e32 v5, 16, v5
; GFX7-NEXT: v_or_b32_e32 v2, v2, v3		; GFX7-NEXT: v_or_b32_e32 v2, v2, v3
; GFX7-NEXT: v_or_b32_e32 v3, v4, v5		; GFX7-NEXT: v_or_b32_e32 v3, v4, v5
; GFX7-NEXT: v_alignbit_b32 v4, v2, v3, 8		; GFX7-NEXT: v_alignbit_b32 v4, v2, v3, 8
; GFX7-NEXT: v_alignbit_b32 v5, v2, v3, 16		; GFX7-NEXT: v_alignbit_b32 v5, v2, v3, 16
; GFX7-NEXT: v_lshrrev_b32_e32 v6, 24, v3		; GFX7-NEXT: v_lshrrev_b32_e32 v6, 24, v3
; GFX7-NEXT: v_lshrrev_b32_e32 v7, 8, v2		; GFX7-NEXT: v_lshrrev_b32_e32 v7, 8, v2
; GFX7-NEXT: v_lshrrev_b32_e32 v8, 16, v2		; GFX7-NEXT: v_lshrrev_b32_e32 v8, 16, v2
; GFX7-NEXT: v_lshrrev_b32_e32 v2, 24, v2		; GFX7-NEXT: v_lshrrev_b32_e32 v2, 24, v2
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v3		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v3
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v5, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v5, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v6, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v7		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v7
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v8		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v8
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc8_vecMul:		; GFX8-LABEL: udot8_acc8_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s2, s[6:7], 0x0
; GFX8-NEXT: s_mov_b32 s0, 0xffff		; GFX8-NEXT: s_mov_b32 s0, 0xffff
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_u32 s8, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s7, s1, 0x40004
; GFX8-NEXT: s_bfe_u32 s10, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s9, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s15, s2, 0x40004		; GFX8-NEXT: s_bfe_u32 s14, s2, 0x40004
; GFX8-NEXT: s_and_b32 s16, s2, 15		; GFX8-NEXT: s_and_b32 s15, s2, 15
; GFX8-NEXT: s_bfe_u32 s17, s2, 0x4000c		; GFX8-NEXT: s_bfe_u32 s16, s2, 0x4000c
; GFX8-NEXT: s_bfe_u32 s4, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s3, s1, 0x40014
; GFX8-NEXT: s_lshr_b32 s6, s1, 28		; GFX8-NEXT: s_lshr_b32 s5, s1, 28
; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40014		; GFX8-NEXT: s_bfe_u32 s10, s2, 0x40014
; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40010		; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40010
; GFX8-NEXT: s_lshr_b32 s13, s2, 28		; GFX8-NEXT: s_lshr_b32 s12, s2, 28
; GFX8-NEXT: s_bfe_u32 s14, s2, 0x40018		; GFX8-NEXT: s_bfe_u32 s13, s2, 0x40018
; GFX8-NEXT: s_bfe_u32 s2, s2, 0x40008		; GFX8-NEXT: s_bfe_u32 s2, s2, 0x40008
; GFX8-NEXT: s_and_b32 s9, s1, 15		; GFX8-NEXT: s_and_b32 s8, s1, 15
; GFX8-NEXT: v_mov_b32_e32 v4, s17		; GFX8-NEXT: v_mov_b32_e32 v4, s16
; GFX8-NEXT: v_mov_b32_e32 v5, s10		; GFX8-NEXT: v_mov_b32_e32 v5, s9
; GFX8-NEXT: v_mov_b32_e32 v6, s16		; GFX8-NEXT: v_mov_b32_e32 v6, s15
; GFX8-NEXT: v_mov_b32_e32 v7, s15		; GFX8-NEXT: v_mov_b32_e32 v7, s14
; GFX8-NEXT: v_mov_b32_e32 v8, s8		; GFX8-NEXT: v_mov_b32_e32 v8, s7
; GFX8-NEXT: v_mul_u32_u24_sdwa v4, v5, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_u32_u24_sdwa v4, v5, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_mul_u32_u24_e32 v5, s9, v6		; GFX8-NEXT: v_mul_u32_u24_e32 v5, s8, v6
; GFX8-NEXT: v_mul_u32_u24_sdwa v6, v8, v7 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_u32_u24_sdwa v6, v8, v7 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: s_bfe_u32 s5, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s4, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s6, s1, 0x40018
; GFX8-NEXT: v_mov_b32_e32 v9, s14		; GFX8-NEXT: v_mov_b32_e32 v9, s13
; GFX8-NEXT: s_bfe_u32 s1, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s1, s1, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v3, s2		; GFX8-NEXT: v_mov_b32_e32 v3, s2
; GFX8-NEXT: v_mov_b32_e32 v10, s13		; GFX8-NEXT: v_mov_b32_e32 v10, s12
; GFX8-NEXT: v_mov_b32_e32 v11, s6		; GFX8-NEXT: v_mov_b32_e32 v11, s5
; GFX8-NEXT: v_mov_b32_e32 v12, s12		; GFX8-NEXT: v_mov_b32_e32 v12, s11
; GFX8-NEXT: v_mov_b32_e32 v13, s11		; GFX8-NEXT: v_mov_b32_e32 v13, s10
; GFX8-NEXT: v_mov_b32_e32 v14, s4		; GFX8-NEXT: v_mov_b32_e32 v14, s3
; GFX8-NEXT: v_mul_u32_u24_e32 v3, s1, v3		; GFX8-NEXT: v_mul_u32_u24_e32 v3, s1, v3
; GFX8-NEXT: v_or_b32_e32 v5, v5, v6		; GFX8-NEXT: v_or_b32_e32 v5, v5, v6
; GFX8-NEXT: v_mul_u32_u24_e32 v7, s7, v9		; GFX8-NEXT: v_mul_u32_u24_e32 v7, s6, v9
; GFX8-NEXT: v_mul_u32_u24_sdwa v8, v11, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_u32_u24_sdwa v8, v11, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_mul_u32_u24_e32 v9, s5, v12		; GFX8-NEXT: v_mul_u32_u24_e32 v9, s4, v12
; GFX8-NEXT: v_mul_u32_u24_sdwa v10, v14, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_u32_u24_sdwa v10, v14, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_and_b32_e32 v5, s0, v5		; GFX8-NEXT: v_and_b32_e32 v5, s0, v5
; GFX8-NEXT: v_or_b32_sdwa v3, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v3, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_or_b32_e32 v9, v9, v10		; GFX8-NEXT: v_or_b32_e32 v9, v9, v10
; GFX8-NEXT: v_or_b32_sdwa v7, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v7, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_and_b32_e32 v4, s0, v9		; GFX8-NEXT: v_and_b32_e32 v4, s0, v9
; GFX8-NEXT: v_or_b32_e32 v3, v5, v3		; GFX8-NEXT: v_or_b32_e32 v3, v5, v3
; GFX8-NEXT: v_or_b32_e32 v6, v4, v7		; GFX8-NEXT: v_or_b32_e32 v6, v4, v7
Show All 18 Lines
; GFX9-NEXT: s_mov_b32 s2, 0xffff		; GFX9-NEXT: s_mov_b32 s2, 0xffff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40018
; GFX9-NEXT: s_lshr_b32 s14, s1, 28		; GFX9-NEXT: s_lshr_b32 s13, s1, 28
; GFX9-NEXT: s_and_b32 s15, s1, 15		; GFX9-NEXT: s_and_b32 s14, s1, 15
; GFX9-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-NEXT: s_bfe_u32 s17, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s16, s1, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v3, s11		; GFX9-NEXT: v_mov_b32_e32 v3, s10
; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v4, s12		; GFX9-NEXT: v_mov_b32_e32 v4, s11
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40018
; GFX9-NEXT: v_mov_b32_e32 v5, s13		; GFX9-NEXT: v_mov_b32_e32 v5, s12
; GFX9-NEXT: s_lshr_b32 s7, s0, 28		; GFX9-NEXT: s_lshr_b32 s6, s0, 28
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: s_and_b32 s8, s0, 15		; GFX9-NEXT: s_and_b32 s7, s0, 15
; GFX9-NEXT: v_mov_b32_e32 v7, s15		; GFX9-NEXT: v_mov_b32_e32 v7, s14
; GFX9-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v8, s16		; GFX9-NEXT: v_mov_b32_e32 v8, s15
; GFX9-NEXT: s_bfe_u32 s10, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s9, s0, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v9, s17		; GFX9-NEXT: v_mov_b32_e32 v9, s16
; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v10, s1		; GFX9-NEXT: v_mov_b32_e32 v10, s1
; GFX9-NEXT: v_mul_lo_u16_e32 v3, s4, v3		; GFX9-NEXT: v_mul_lo_u16_e32 v3, s3, v3
; GFX9-NEXT: v_mul_lo_u16_sdwa v4, s5, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v4, s4, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_mul_lo_u16_e32 v5, s6, v5		; GFX9-NEXT: v_mul_lo_u16_e32 v5, s5, v5
; GFX9-NEXT: v_mul_lo_u16_sdwa v6, s7, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v6, s6, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_mul_lo_u16_e32 v7, s8, v7		; GFX9-NEXT: v_mul_lo_u16_e32 v7, s7, v7
; GFX9-NEXT: v_mul_lo_u16_sdwa v8, s9, v8 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v8, s8, v8 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_or_b32_e32 v3, v3, v4		; GFX9-NEXT: v_or_b32_e32 v3, v3, v4
; GFX9-NEXT: v_or_b32_sdwa v4, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_or_b32_sdwa v4, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_or_b32_e32 v5, v7, v8		; GFX9-NEXT: v_or_b32_e32 v5, v7, v8
; GFX9-NEXT: v_mul_lo_u16_e32 v9, s10, v9		; GFX9-NEXT: v_mul_lo_u16_e32 v9, s9, v9
; GFX9-NEXT: v_mul_lo_u16_sdwa v10, s0, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v10, s0, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_and_b32_e32 v5, s2, v5		; GFX9-NEXT: v_and_b32_e32 v5, s2, v5
; GFX9-NEXT: v_or_b32_sdwa v6, v9, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_or_b32_sdwa v6, v9, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_or_b32_e32 v6, v5, v6		; GFX9-NEXT: v_or_b32_e32 v6, v5, v6
; GFX9-NEXT: v_lshrrev_b32_e32 v7, 8, v6		; GFX9-NEXT: v_lshrrev_b32_e32 v7, 8, v6
; GFX9-NEXT: v_and_b32_e32 v3, s2, v3		; GFX9-NEXT: v_and_b32_e32 v3, s2, v3
; GFX9-NEXT: v_or_b32_e32 v4, v3, v4		; GFX9-NEXT: v_or_b32_e32 v4, v3, v4
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
Show All 16 Lines
; GFX9-DL-NEXT: s_mov_b32 s2, 0xffff		; GFX9-DL-NEXT: s_mov_b32 s2, 0xffff
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s14, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s13, s1, 28
; GFX9-DL-NEXT: s_and_b32 s15, s1, 15		; GFX9-DL-NEXT: s_and_b32 s14, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-DL-NEXT: s_bfe_u32 s17, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s16, s1, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10
; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s11
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40018
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s12
; GFX9-DL-NEXT: s_lshr_b32 s7, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s6, s0, 28
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: s_and_b32 s8, s0, 15		; GFX9-DL-NEXT: s_and_b32 s7, s0, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s14
; GFX9-DL-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s15
; GFX9-DL-NEXT: s_bfe_u32 s10, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s9, s0, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s17		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s16
; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v10, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v10, s1
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v3, s4, v3		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v3, s3, v3
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v4, s5, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v4, s4, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v5, s6, v5		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v5, s5, v5
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v6, s7, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v6, s6, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v7, s8, v7		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v7, s7, v7
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v8, s9, v8 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v8, s8, v8 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_or_b32_e32 v3, v3, v4		; GFX9-DL-NEXT: v_or_b32_e32 v3, v3, v4
; GFX9-DL-NEXT: v_or_b32_sdwa v4, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_or_b32_sdwa v4, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_or_b32_e32 v5, v7, v8		; GFX9-DL-NEXT: v_or_b32_e32 v5, v7, v8
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v9, s10, v9		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v9, s9, v9
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v10, s0, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v10, s0, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_and_b32_e32 v5, s2, v5		; GFX9-DL-NEXT: v_and_b32_e32 v5, s2, v5
; GFX9-DL-NEXT: v_or_b32_sdwa v6, v9, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_or_b32_sdwa v6, v9, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_or_b32_e32 v6, v5, v6		; GFX9-DL-NEXT: v_or_b32_e32 v6, v5, v6
; GFX9-DL-NEXT: v_lshrrev_b32_e32 v7, 8, v6		; GFX9-DL-NEXT: v_lshrrev_b32_e32 v7, 8, v6
; GFX9-DL-NEXT: v_and_b32_e32 v3, s2, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, s2, v3
; GFX9-DL-NEXT: v_or_b32_e32 v4, v3, v4		; GFX9-DL-NEXT: v_or_b32_e32 v4, v3, v4
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_add_u32_e32 v2, v5, v2		; GFX9-DL-NEXT: v_add_u32_e32 v2, v5, v2
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v7		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v7
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_2		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_lshrrev_b32_e32 v3, 8, v4		; GFX9-DL-NEXT: v_lshrrev_b32_e32 v3, 8, v4
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc8_vecMul:		; GFX10-DL-LABEL: udot8_acc8_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40004
; GFX10-DL-NEXT: s_and_b32 s5, s0, 15		; GFX10-DL-NEXT: s_and_b32 s4, s0, 15
; GFX10-DL-NEXT: s_and_b32 s7, s1, 15		; GFX10-DL-NEXT: s_and_b32 s6, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x4000c
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, s2, s4		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, s2, s3
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, s5, s7		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, s4, s6
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40008
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s6, s8		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s5, s7
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 8, v3		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 8, v3
; GFX10-DL-NEXT: s_mov_b32 s5, 0xffff		; GFX10-DL-NEXT: s_mov_b32 s4, 0xffff
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v6, s2, s4		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v6, s2, s3
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v5, 8, v5		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v5, 8, v5
; GFX10-DL-NEXT: v_or_b32_e32 v3, v4, v3		; GFX10-DL-NEXT: v_or_b32_e32 v3, v4, v3
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40018
; GFX10-DL-NEXT: v_or_b32_sdwa v4, v6, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX10-DL-NEXT: v_or_b32_sdwa v4, v6, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX10-DL-NEXT: v_and_b32_e32 v3, s5, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s4, v3
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40010
; GFX10-DL-NEXT: s_lshr_b32 s9, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s8, s1, 28
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s4, s7		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s3, s6
; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v4		; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v4
; GFX10-DL-NEXT: s_bfe_u32 s1, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s1, s1, 0x40018
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v6, s2, s8		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v6, s2, s7
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v7, s0, s9		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v7, s0, s8
; GFX10-DL-NEXT: v_lshrrev_b32_e32 v8, 8, v4		; GFX10-DL-NEXT: v_lshrrev_b32_e32 v8, 8, v4
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 8, v7		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 8, v7
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 8, v5		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 8, v5
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s6, s1		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s5, s1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v8		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v8
; GFX10-DL-NEXT: v_or_b32_e32 v3, v6, v3		; GFX10-DL-NEXT: v_or_b32_e32 v3, v6, v3
; GFX10-DL-NEXT: v_or_b32_sdwa v5, v5, v7 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX10-DL-NEXT: v_or_b32_sdwa v5, v5, v7 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_2		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_2
; GFX10-DL-NEXT: v_and_b32_e32 v3, s5, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s4, v3
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v5		; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v5
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_lshrrev_b32_e32 v3, 8, v4		; GFX10-DL-NEXT: v_lshrrev_b32_e32 v3, 8, v4
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
Show All 30 Lines	entry:
store i8 %add8, i8 addrspace(1)* %dst, align 4		store i8 %add8, i8 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Once the adictional "and+add" are removed, the pattern will be recognized.		; TODO: Once the adictional "and+add" are removed, the pattern will be recognized.
define amdgpu_kernel void @udot8_acc4_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc4_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc4_vecMul:		; GFX7-LABEL: udot8_acc4_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_and_b32_e32 v0, 15, v0		; GFX7-NEXT: v_and_b32_e32 v0, 15, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc4_vecMul:		; GFX8-LABEL: udot8_acc4_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_and_b32 s9, s0, 15		; GFX8-NEXT: s_and_b32 s8, s0, 15
; GFX8-NEXT: s_and_b32 s16, s1, 15		; GFX8-NEXT: s_and_b32 s15, s1, 15
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX8-NEXT: v_and_b32_e32 v3, 15, v3		; GFX8-NEXT: v_and_b32_e32 v3, 15, v3
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v2		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc4_vecMul:		; GFX9-LABEL: udot8_acc4_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_and_b32 s9, s0, 15		; GFX9-NEXT: s_and_b32 s8, s0, 15
; GFX9-NEXT: s_and_b32 s16, s1, 15		; GFX9-NEXT: s_and_b32 s15, s1, 15
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc4_vecMul:		; GFX9-DL-LABEL: udot8_acc4_vecMul:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_and_b32 s9, s0, 15		; GFX9-DL-NEXT: s_and_b32 s8, s0, 15
; GFX9-DL-NEXT: s_and_b32 s16, s1, 15		; GFX9-DL-NEXT: s_and_b32 s15, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc4_vecMul:		; GFX10-DL-LABEL: udot8_acc4_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40008
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x4000c
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s7, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s6, v2
; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s4, s5		; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s3, s4
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i4 addrspace(1)* nocapture %dst) {		i4 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll

Show All 10 Lines	define amdgpu_kernel void @extract_w_offset_vgpr(i32 addrspace(1)* %out) {
; GCN-LABEL: name: extract_w_offset_vgpr		; GCN-LABEL: name: extract_w_offset_vgpr
; GCN: bb.0.entry:		; GCN: bb.0.entry:
; GCN: successors: %bb.1(0x80000000)		; GCN: successors: %bb.1(0x80000000)
; GCN: liveins: $vgpr0, $sgpr0_sgpr1		; GCN: liveins: $vgpr0, $sgpr0_sgpr1
; GCN: renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr0_sgpr1, 36, 0, 0 :: (dereferenceable invariant load 8 from %ir.out.kernarg.offset.cast, align 4, addrspace 4)		; GCN: renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr0_sgpr1, 36, 0, 0 :: (dereferenceable invariant load 8 from %ir.out.kernarg.offset.cast, align 4, addrspace 4)
; GCN: renamable $sgpr2 = COPY renamable $sgpr1		; GCN: renamable $sgpr2 = COPY renamable $sgpr1
; GCN: renamable $sgpr0 = COPY renamable $sgpr0, implicit killed $sgpr0_sgpr1		; GCN: renamable $sgpr0 = COPY renamable $sgpr0, implicit killed $sgpr0_sgpr1
; GCN: renamable $sgpr1 = S_MOV_B32 61440		; GCN: renamable $sgpr1 = S_MOV_B32 61440
; GCN: renamable $sgpr4 = S_MOV_B32 -1		; GCN: renamable $sgpr3 = S_MOV_B32 -1
; GCN: undef renamable $sgpr8 = COPY killed renamable $sgpr0, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11		; GCN: undef renamable $sgpr4 = COPY killed renamable $sgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7
; GCN: renamable $sgpr9 = COPY killed renamable $sgpr2		; GCN: renamable $sgpr5 = COPY killed renamable $sgpr2
; GCN: renamable $sgpr10 = COPY killed renamable $sgpr4		; GCN: renamable $sgpr6 = COPY killed renamable $sgpr3
; GCN: renamable $sgpr11 = COPY killed renamable $sgpr1		; GCN: renamable $sgpr7 = COPY killed renamable $sgpr1
; GCN: renamable $sgpr0 = S_MOV_B32 16		; GCN: renamable $sgpr0 = S_MOV_B32 16
; GCN: renamable $sgpr1 = S_MOV_B32 15		; GCN: renamable $sgpr1 = S_MOV_B32 15
; GCN: renamable $sgpr2 = S_MOV_B32 14		; GCN: renamable $sgpr2 = S_MOV_B32 14
; GCN: renamable $sgpr4 = S_MOV_B32 13		; GCN: renamable $sgpr3 = S_MOV_B32 13
; GCN: renamable $sgpr5 = S_MOV_B32 12		; GCN: renamable $sgpr8 = S_MOV_B32 12
; GCN: renamable $sgpr6 = S_MOV_B32 11		; GCN: renamable $sgpr9 = S_MOV_B32 11
; GCN: renamable $sgpr7 = S_MOV_B32 10		; GCN: renamable $sgpr10 = S_MOV_B32 10
; GCN: renamable $sgpr12 = S_MOV_B32 9		; GCN: renamable $sgpr11 = S_MOV_B32 9
; GCN: renamable $sgpr13 = S_MOV_B32 8		; GCN: renamable $sgpr12 = S_MOV_B32 8
; GCN: renamable $sgpr14 = S_MOV_B32 7		; GCN: renamable $sgpr13 = S_MOV_B32 7
; GCN: renamable $sgpr15 = S_MOV_B32 6		; GCN: renamable $sgpr14 = S_MOV_B32 6
; GCN: renamable $sgpr16 = S_MOV_B32 5		; GCN: renamable $sgpr15 = S_MOV_B32 5
; GCN: renamable $sgpr17 = S_MOV_B32 3		; GCN: renamable $sgpr16 = S_MOV_B32 3
; GCN: renamable $sgpr18 = S_MOV_B32 2		; GCN: renamable $sgpr17 = S_MOV_B32 2
; GCN: renamable $sgpr19 = S_MOV_B32 1		; GCN: renamable $sgpr18 = S_MOV_B32 1
; GCN: renamable $sgpr20 = S_MOV_B32 0		; GCN: renamable $sgpr19 = S_MOV_B32 0
; GCN: renamable $vgpr1 = COPY killed renamable $sgpr20		; GCN: renamable $vgpr1 = COPY killed renamable $sgpr19
; GCN: renamable $vgpr2 = COPY killed renamable $sgpr19		; GCN: renamable $vgpr2 = COPY killed renamable $sgpr18
; GCN: renamable $vgpr3 = COPY killed renamable $sgpr18		; GCN: renamable $vgpr3 = COPY killed renamable $sgpr17
; GCN: renamable $vgpr4 = COPY killed renamable $sgpr17		; GCN: renamable $vgpr4 = COPY killed renamable $sgpr16
; GCN: renamable $vgpr5 = COPY killed renamable $sgpr16		; GCN: renamable $vgpr5 = COPY killed renamable $sgpr15
; GCN: renamable $vgpr6 = COPY killed renamable $sgpr15		; GCN: renamable $vgpr6 = COPY killed renamable $sgpr14
; GCN: renamable $vgpr7 = COPY killed renamable $sgpr14		; GCN: renamable $vgpr7 = COPY killed renamable $sgpr13
; GCN: renamable $vgpr8 = COPY killed renamable $sgpr13		; GCN: renamable $vgpr8 = COPY killed renamable $sgpr12
; GCN: renamable $vgpr9 = COPY killed renamable $sgpr12		; GCN: renamable $vgpr9 = COPY killed renamable $sgpr11
; GCN: renamable $vgpr10 = COPY killed renamable $sgpr7		; GCN: renamable $vgpr10 = COPY killed renamable $sgpr10
; GCN: renamable $vgpr11 = COPY killed renamable $sgpr6		; GCN: renamable $vgpr11 = COPY killed renamable $sgpr9
; GCN: renamable $vgpr12 = COPY killed renamable $sgpr5		; GCN: renamable $vgpr12 = COPY killed renamable $sgpr8
; GCN: renamable $vgpr13 = COPY killed renamable $sgpr4		; GCN: renamable $vgpr13 = COPY killed renamable $sgpr3
; GCN: renamable $vgpr14 = COPY killed renamable $sgpr2		; GCN: renamable $vgpr14 = COPY killed renamable $sgpr2
; GCN: renamable $vgpr15 = COPY killed renamable $sgpr1		; GCN: renamable $vgpr15 = COPY killed renamable $sgpr1
; GCN: renamable $vgpr16 = COPY killed renamable $sgpr0		; GCN: renamable $vgpr16 = COPY killed renamable $sgpr0
; GCN: undef renamable $vgpr17 = COPY killed renamable $vgpr1, implicit-def $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32		; GCN: undef renamable $vgpr17 = COPY killed renamable $vgpr1, implicit-def $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32
; GCN: renamable $vgpr18 = COPY killed renamable $vgpr2		; GCN: renamable $vgpr18 = COPY killed renamable $vgpr2
; GCN: renamable $vgpr19 = COPY killed renamable $vgpr3		; GCN: renamable $vgpr19 = COPY killed renamable $vgpr3
; GCN: renamable $vgpr20 = COPY killed renamable $vgpr4		; GCN: renamable $vgpr20 = COPY killed renamable $vgpr4
; GCN: renamable $vgpr21 = COPY killed renamable $vgpr5		; GCN: renamable $vgpr21 = COPY killed renamable $vgpr5
; GCN: renamable $vgpr22 = COPY killed renamable $vgpr6		; GCN: renamable $vgpr22 = COPY killed renamable $vgpr6
; GCN: renamable $vgpr23 = COPY killed renamable $vgpr7		; GCN: renamable $vgpr23 = COPY killed renamable $vgpr7
; GCN: renamable $vgpr24 = COPY killed renamable $vgpr8		; GCN: renamable $vgpr24 = COPY killed renamable $vgpr8
; GCN: renamable $vgpr25 = COPY killed renamable $vgpr9		; GCN: renamable $vgpr25 = COPY killed renamable $vgpr9
; GCN: renamable $vgpr26 = COPY killed renamable $vgpr10		; GCN: renamable $vgpr26 = COPY killed renamable $vgpr10
; GCN: renamable $vgpr27 = COPY killed renamable $vgpr11		; GCN: renamable $vgpr27 = COPY killed renamable $vgpr11
; GCN: renamable $vgpr28 = COPY killed renamable $vgpr12		; GCN: renamable $vgpr28 = COPY killed renamable $vgpr12
; GCN: renamable $vgpr29 = COPY killed renamable $vgpr13		; GCN: renamable $vgpr29 = COPY killed renamable $vgpr13
; GCN: renamable $vgpr30 = COPY killed renamable $vgpr14		; GCN: renamable $vgpr30 = COPY killed renamable $vgpr14
; GCN: renamable $vgpr31 = COPY killed renamable $vgpr15		; GCN: renamable $vgpr31 = COPY killed renamable $vgpr15
; GCN: renamable $vgpr32 = COPY killed renamable $vgpr16		; GCN: renamable $vgpr32 = COPY killed renamable $vgpr16
; GCN: renamable $sgpr22_sgpr23 = S_MOV_B64 $exec		; GCN: renamable $sgpr20_sgpr21 = S_MOV_B64 $exec
; GCN: renamable $vgpr1 = IMPLICIT_DEF		; GCN: renamable $vgpr1 = IMPLICIT_DEF
; GCN: renamable $sgpr24_sgpr25 = IMPLICIT_DEF		; GCN: renamable $sgpr22_sgpr23 = IMPLICIT_DEF
; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
; GCN: SI_SPILL_S128_SAVE killed $sgpr8_sgpr9_sgpr10_sgpr11, %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 16 into %stack.1, align 4, addrspace 5)		; GCN: SI_SPILL_S128_SAVE killed $sgpr4_sgpr5_sgpr6_sgpr7, %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99 :: (store 16 into %stack.1, align 4, addrspace 5)
; GCN: SI_SPILL_V512_SAVE killed $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32, %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 64 into %stack.2, align 4, addrspace 5)		; GCN: SI_SPILL_V512_SAVE killed $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32, %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (store 64 into %stack.2, align 4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr22_sgpr23, %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 8 into %stack.3, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr20_sgpr21, %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99 :: (store 8 into %stack.3, align 4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr24_sgpr25, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 8 into %stack.5, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr22_sgpr23, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99 :: (store 8 into %stack.5, align 4, addrspace 5)
; GCN: bb.1:		; GCN: bb.1:
; GCN: successors: %bb.1(0x40000000), %bb.2(0x40000000)		; GCN: successors: %bb.1(0x40000000), %bb.2(0x40000000)
; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (load 8 from %stack.5, align 4, addrspace 5)		; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99 :: (load 8 from %stack.5, align 4, addrspace 5)
; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)		; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)
; GCN: $vgpr1 = SI_SPILL_V32_RESTORE %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)		; GCN: $vgpr1 = SI_SPILL_V32_RESTORE %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
; GCN: renamable $sgpr2 = V_READFIRSTLANE_B32 $vgpr1, implicit $exec		; GCN: renamable $sgpr2 = V_READFIRSTLANE_B32 $vgpr1, implicit $exec
; GCN: renamable $sgpr4_sgpr5 = V_CMP_EQ_U32_e64 $sgpr2, killed $vgpr1, implicit $exec		; GCN: renamable $sgpr4_sgpr5 = V_CMP_EQ_U32_e64 $sgpr2, killed $vgpr1, implicit $exec
; GCN: renamable $sgpr4_sgpr5 = S_AND_SAVEEXEC_B64 killed renamable $sgpr4_sgpr5, implicit-def $exec, implicit-def $scc, implicit $exec		; GCN: renamable $sgpr4_sgpr5 = S_AND_SAVEEXEC_B64 killed renamable $sgpr4_sgpr5, implicit-def $exec, implicit-def $scc, implicit $exec
; GCN: S_SET_GPR_IDX_ON killed renamable $sgpr2, 1, implicit-def $m0, implicit undef $m0		; GCN: S_SET_GPR_IDX_ON killed renamable $sgpr2, 1, implicit-def $m0, implicit undef $m0
; GCN: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = SI_SPILL_V512_RESTORE %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (load 64 from %stack.2, align 4, addrspace 5)		; GCN: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = SI_SPILL_V512_RESTORE %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (load 64 from %stack.2, align 4, addrspace 5)
; GCN: renamable $vgpr18 = V_MOV_B32_e32 undef $vgpr3, implicit $exec, implicit killed $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17, implicit $m0		; GCN: renamable $vgpr18 = V_MOV_B32_e32 undef $vgpr3, implicit $exec, implicit killed $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17, implicit $m0
; GCN: S_SET_GPR_IDX_OFF		; GCN: S_SET_GPR_IDX_OFF
; GCN: renamable $vgpr19 = COPY renamable $vgpr18		; GCN: renamable $vgpr19 = COPY renamable $vgpr18
; GCN: renamable $sgpr6_sgpr7 = COPY renamable $sgpr4_sgpr5		; GCN: renamable $sgpr6_sgpr7 = COPY renamable $sgpr4_sgpr5
; GCN: SI_SPILL_S64_SAVE killed $sgpr6_sgpr7, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 8 into %stack.5, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr6_sgpr7, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99 :: (store 8 into %stack.5, align 4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 8 into %stack.6, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99 :: (store 8 into %stack.6, align 4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr19, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr19, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.7, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.7, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr18, %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.8, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr18, %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (store 4 into %stack.8, addrspace 5)
; GCN: $exec = S_XOR_B64_term $exec, killed renamable $sgpr4_sgpr5, implicit-def $scc		; GCN: $exec = S_XOR_B64_term $exec, killed renamable $sgpr4_sgpr5, implicit-def $scc
; GCN: S_CBRANCH_EXECNZ %bb.1, implicit $exec		; GCN: S_CBRANCH_EXECNZ %bb.1, implicit $exec
; GCN: bb.2:		; GCN: bb.2:
; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (load 8 from %stack.3, align 4, addrspace 5)		; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99 :: (load 8 from %stack.3, align 4, addrspace 5)
; GCN: $exec = S_MOV_B64 renamable $sgpr0_sgpr1		; GCN: $exec = S_MOV_B64 renamable $sgpr0_sgpr1
; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (load 4 from %stack.8, addrspace 5)		; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, 0, 0, implicit $exec :: (load 4 from %stack.8, addrspace 5)
; GCN: $sgpr4_sgpr5_sgpr6_sgpr7 = SI_SPILL_S128_RESTORE %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (load 16 from %stack.1, align 4, addrspace 5)		; GCN: $sgpr4_sgpr5_sgpr6_sgpr7 = SI_SPILL_S128_RESTORE %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99 :: (load 16 from %stack.1, align 4, addrspace 5)
; GCN: BUFFER_STORE_DWORD_OFFSET renamable $vgpr0, renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.out.load, addrspace 1)		; GCN: BUFFER_STORE_DWORD_OFFSET renamable $vgpr0, renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.out.load, addrspace 1)
; GCN: S_ENDPGM 0		; GCN: S_ENDPGM 0
entry:		entry:
%id = call i32 @llvm.amdgcn.workitem.id.x() #1		%id = call i32 @llvm.amdgcn.workitem.id.x() #1
%index = add i32 %id, 1		%index = add i32 %id, 1
%value = extractelement <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, i32 %index		%value = extractelement <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, i32 %index
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll

	Show First 20 Lines • Show All 1,614 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_kernel void @dynamic_insertelement_v8f64(<8 x double> addrspace(1)* %out, <8 x double> %a, i32 %b) #0 {			define amdgpu_kernel void @dynamic_insertelement_v8f64(<8 x double> addrspace(1)* %out, <8 x double> %a, i32 %b) #0 {
	; SI-LABEL: dynamic_insertelement_v8f64:			; SI-LABEL: dynamic_insertelement_v8f64:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_load_dwordx2 s[8:9], s[4:5], 0x0			; SI-NEXT: s_load_dwordx2 s[8:9], s[4:5], 0x0
	; SI-NEXT: s_load_dwordx16 s[12:27], s[4:5], 0x10			; SI-NEXT: s_load_dwordx16 s[12:27], s[4:5], 0x10
	; SI-NEXT: s_load_dword s4, s[4:5], 0x20			; SI-NEXT: s_load_dword s4, s[4:5], 0x20
				; SI-NEXT: s_add_u32 s0, s0, s7
				; SI-NEXT: s_addc_u32 s1, s1, 0
	; SI-NEXT: v_mov_b32_e32 v16, 64			; SI-NEXT: v_mov_b32_e32 v16, 64
	; SI-NEXT: s_mov_b32 s11, 0x100f000
	; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: v_mov_b32_e32 v0, s12			; SI-NEXT: v_mov_b32_e32 v0, s12
	; SI-NEXT: s_and_b32 s4, s4, 7			; SI-NEXT: s_and_b32 s4, s4, 7
	; SI-NEXT: s_lshl_b32 s4, s4, 3			; SI-NEXT: s_lshl_b32 s4, s4, 3
	; SI-NEXT: v_mov_b32_e32 v1, s13			; SI-NEXT: v_mov_b32_e32 v1, s13
	; SI-NEXT: v_mov_b32_e32 v12, s24			; SI-NEXT: v_mov_b32_e32 v12, s24
	; SI-NEXT: v_mov_b32_e32 v13, s25			; SI-NEXT: v_mov_b32_e32 v13, s25
	; SI-NEXT: v_mov_b32_e32 v14, s26			; SI-NEXT: v_mov_b32_e32 v14, s26
	; SI-NEXT: v_mov_b32_e32 v15, s27			; SI-NEXT: v_mov_b32_e32 v15, s27
	; SI-NEXT: v_mov_b32_e32 v2, s14			; SI-NEXT: v_mov_b32_e32 v2, s14
	; SI-NEXT: v_mov_b32_e32 v3, s15			; SI-NEXT: v_mov_b32_e32 v3, s15
	; SI-NEXT: v_mov_b32_e32 v4, s16			; SI-NEXT: v_mov_b32_e32 v4, s16
	; SI-NEXT: v_mov_b32_e32 v5, s17			; SI-NEXT: v_mov_b32_e32 v5, s17
	; SI-NEXT: v_mov_b32_e32 v6, s18			; SI-NEXT: v_mov_b32_e32 v6, s18
	; SI-NEXT: v_mov_b32_e32 v7, s19			; SI-NEXT: v_mov_b32_e32 v7, s19
	; SI-NEXT: v_mov_b32_e32 v8, s20			; SI-NEXT: v_mov_b32_e32 v8, s20
	; SI-NEXT: v_mov_b32_e32 v9, s21			; SI-NEXT: v_mov_b32_e32 v9, s21
	; SI-NEXT: v_mov_b32_e32 v10, s22			; SI-NEXT: v_mov_b32_e32 v10, s22
	; SI-NEXT: v_mov_b32_e32 v11, s23			; SI-NEXT: v_mov_b32_e32 v11, s23
	; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], 0 offset:112
	; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:96
	; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 offset:80
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; SI-NEXT: v_or_b32_e32 v16, s4, v16			; SI-NEXT: v_or_b32_e32 v16, s4, v16
	; SI-NEXT: v_mov_b32_e32 v0, 0			; SI-NEXT: v_mov_b32_e32 v0, 0
	; SI-NEXT: v_mov_b32_e32 v1, 0x40200000			; SI-NEXT: v_mov_b32_e32 v1, 0x40200000
	; SI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], s7 offen			; SI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], 0 offen
	; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:80
	; SI-NEXT: buffer_load_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; SI-NEXT: buffer_load_dwordx4 v[8:11], off, s[0:3], 0 offset:96
	; SI-NEXT: buffer_load_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; SI-NEXT: buffer_load_dwordx4 v[12:15], off, s[0:3], 0 offset:112
				; SI-NEXT: s_mov_b32 s11, 0x100f000
				; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[8:11], 0 offset:48			; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[8:11], 0 offset:48
	; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[8:11], 0 offset:32			; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[8:11], 0 offset:32
	; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[8:11], 0 offset:16			; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[8:11], 0 offset:16
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: dynamic_insertelement_v8f64:			; VI-LABEL: dynamic_insertelement_v8f64:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[8:9], s[4:5], 0x0			; VI-NEXT: s_load_dwordx2 s[8:9], s[4:5], 0x0
	; VI-NEXT: s_load_dwordx16 s[12:27], s[4:5], 0x40			; VI-NEXT: s_load_dwordx16 s[12:27], s[4:5], 0x40
	; VI-NEXT: s_load_dword s4, s[4:5], 0x80			; VI-NEXT: s_load_dword s4, s[4:5], 0x80
				; VI-NEXT: s_add_u32 s0, s0, s7
				; VI-NEXT: s_addc_u32 s1, s1, 0
	; VI-NEXT: v_mov_b32_e32 v16, 64			; VI-NEXT: v_mov_b32_e32 v16, 64
	; VI-NEXT: s_mov_b32 s11, 0x1100f000
	; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s12			; VI-NEXT: v_mov_b32_e32 v0, s12
	; VI-NEXT: s_and_b32 s4, s4, 7			; VI-NEXT: s_and_b32 s4, s4, 7
	; VI-NEXT: s_lshl_b32 s4, s4, 3			; VI-NEXT: s_lshl_b32 s4, s4, 3
	; VI-NEXT: v_mov_b32_e32 v1, s13			; VI-NEXT: v_mov_b32_e32 v1, s13
	; VI-NEXT: v_mov_b32_e32 v12, s24			; VI-NEXT: v_mov_b32_e32 v12, s24
	; VI-NEXT: v_mov_b32_e32 v13, s25			; VI-NEXT: v_mov_b32_e32 v13, s25
	; VI-NEXT: v_mov_b32_e32 v14, s26			; VI-NEXT: v_mov_b32_e32 v14, s26
	; VI-NEXT: v_mov_b32_e32 v15, s27			; VI-NEXT: v_mov_b32_e32 v15, s27
	; VI-NEXT: v_mov_b32_e32 v2, s14			; VI-NEXT: v_mov_b32_e32 v2, s14
	; VI-NEXT: v_mov_b32_e32 v3, s15			; VI-NEXT: v_mov_b32_e32 v3, s15
	; VI-NEXT: v_mov_b32_e32 v4, s16			; VI-NEXT: v_mov_b32_e32 v4, s16
	; VI-NEXT: v_mov_b32_e32 v5, s17			; VI-NEXT: v_mov_b32_e32 v5, s17
	; VI-NEXT: v_mov_b32_e32 v6, s18			; VI-NEXT: v_mov_b32_e32 v6, s18
	; VI-NEXT: v_mov_b32_e32 v7, s19			; VI-NEXT: v_mov_b32_e32 v7, s19
	; VI-NEXT: v_mov_b32_e32 v8, s20			; VI-NEXT: v_mov_b32_e32 v8, s20
	; VI-NEXT: v_mov_b32_e32 v9, s21			; VI-NEXT: v_mov_b32_e32 v9, s21
	; VI-NEXT: v_mov_b32_e32 v10, s22			; VI-NEXT: v_mov_b32_e32 v10, s22
	; VI-NEXT: v_mov_b32_e32 v11, s23			; VI-NEXT: v_mov_b32_e32 v11, s23
	; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], 0 offset:112
	; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:96
	; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 offset:80
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; VI-NEXT: v_or_b32_e32 v16, s4, v16			; VI-NEXT: v_or_b32_e32 v16, s4, v16
	; VI-NEXT: v_mov_b32_e32 v0, 0			; VI-NEXT: v_mov_b32_e32 v0, 0
	; VI-NEXT: v_mov_b32_e32 v1, 0x40200000			; VI-NEXT: v_mov_b32_e32 v1, 0x40200000
	; VI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], s7 offen			; VI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], 0 offen
	; VI-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; VI-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; VI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; VI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:80
	; VI-NEXT: buffer_load_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; VI-NEXT: buffer_load_dwordx4 v[8:11], off, s[0:3], 0 offset:96
	; VI-NEXT: buffer_load_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; VI-NEXT: buffer_load_dwordx4 v[12:15], off, s[0:3], 0 offset:112
				; VI-NEXT: s_mov_b32 s11, 0x1100f000
				; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[8:11], 0 offset:48			; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[8:11], 0 offset:48
	; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[8:11], 0 offset:32			; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[8:11], 0 offset:32
	; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[8:11], 0 offset:16			; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[8:11], 0 offset:16
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%vecins = insertelement <8 x double> %a, double 8.0, i32 %b			%vecins = insertelement <8 x double> %a, double 8.0, i32 %b
	store <8 x double> %vecins, <8 x double> addrspace(1)* %out, align 16			store <8 x double> %vecins, <8 x double> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	declare <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone }			attributes #1 = { nounwind readnone }

llvm/test/CodeGen/AMDGPU/ipra.ll

	Show All 24 Lines
	; GCN: flat_load_dword v8			; GCN: flat_load_dword v8
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-NOT: buffer_store			; GCN-NOT: buffer_store
	; GCN-NOT: buffer_load			; GCN-NOT: buffer_load
	; GCN-NOT: readlane			; GCN-NOT: readlane
	; GCN-NOT: writelane			; GCN-NOT: writelane
	; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v8			; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v8

	; GCN: ; NumSgprs: 38			; GCN: ; NumSgprs: 37
	; GCN: ; NumVgprs: 9			; GCN: ; NumVgprs: 9
	define amdgpu_kernel void @kernel_call() #0 {			define amdgpu_kernel void @kernel_call() #0 {
	%vgpr = load volatile i32, i32 addrspace(1)* undef			%vgpr = load volatile i32, i32 addrspace(1)* undef
	tail call void @func()			tail call void @func()
	store volatile i32 %vgpr, i32 addrspace(1)* undef			store volatile i32 %vgpr, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GCNHSA: private_segment_alignment = 4			; GCNHSA: private_segment_alignment = 4
	; GCNHSA: .end_amd_kernel_code_t			; GCNHSA: .end_amd_kernel_code_t

	; GFX10HSA: s_add_u32 [[FLAT_SCR_LO:s[0-9]+]], s{{[0-9]+}}, s{{[0-9]+}}			; GFX10HSA: s_add_u32 [[FLAT_SCR_LO:s[0-9]+]], s{{[0-9]+}}, s{{[0-9]+}}
	; GFX10HSA-DAG: s_addc_u32 [[FLAT_SCR_HI:s[0-9]+]], s{{[0-9]+}}, 0			; GFX10HSA-DAG: s_addc_u32 [[FLAT_SCR_HI:s[0-9]+]], s{{[0-9]+}}, 0
	; GFX10HSA-DAG: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), [[FLAT_SCR_LO]]			; GFX10HSA-DAG: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), [[FLAT_SCR_LO]]
	; GFX10HSA-DAG: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), [[FLAT_SCR_HI]]			; GFX10HSA-DAG: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), [[FLAT_SCR_HI]]

	; GCNHSA: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[0:3], s9 offen			; GCNHSA: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[0:3], 0 offen
	; GCNHSA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[0:3], s9 offen			; GCNHSA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[0:3], 0 offen

	; Scratch size = alloca size + emergency stack slot, align {{.*}}, addrspace(5)			; Scratch size = alloca size + emergency stack slot, align {{.*}}, addrspace(5)
	; ALL: ; ScratchSize: 32772			; ALL: ; ScratchSize: 32772
	define amdgpu_kernel void @large_alloca_compute_shader(i32 %x, i32 %y) #0 {			define amdgpu_kernel void @large_alloca_compute_shader(i32 %x, i32 %y) #0 {
	%large = alloca [8192 x i32], align 4, addrspace(5)			%large = alloca [8192 x i32], align 4, addrspace(5)
	%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191			%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191
	store volatile i32 %x, i32 addrspace(5)* %gep			store volatile i32 %x, i32 addrspace(5)* %gep
	%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y			%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y
	%val = load volatile i32, i32 addrspace(5)* %gep1			%val = load volatile i32, i32 addrspace(5)* %gep1
	store volatile i32 %val, i32 addrspace(1)* undef			store volatile i32 %val, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/large-alloca-graphics.ll

	; RUN: llc -march=amdgcn -mcpu=bonaire < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=ALL %s			; RUN: llc -march=amdgcn -mcpu=bonaire < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=ALL %s
	; RUN: llc -march=amdgcn -mcpu=carrizo -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=ALL %s			; RUN: llc -march=amdgcn -mcpu=carrizo -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=ALL %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 -check-prefix=ALL %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 -check-prefix=ALL %s

	; ALL-LABEL: {{^}}large_alloca_pixel_shader:			; ALL-LABEL: {{^}}large_alloca_pixel_shader:
	; GCN-DAG: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN-DAG: s_mov_b32 s9, SCRATCH_RSRC_DWORD1			; GCN-DAG: s_mov_b32 s5, SCRATCH_RSRC_DWORD1
	; GCN-DAG: s_mov_b32 s10, -1			; GCN-DAG: s_mov_b32 s6, -1
	; CI-DAG: s_mov_b32 s11, 0xe8f000
	; VI-DAG: s_mov_b32 s11, 0xe80000			; CI-DAG: s_mov_b32 s7, 0xe8f000
	; GFX9-DAG: s_mov_b32 s11, 0xe00000			; VI-DAG: s_mov_b32 s7, 0xe80000
				; GFX9-DAG: s_mov_b32 s7, 0xe00000

	; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[8:11], s0 offen			; GCN: s_add_u32 s4, s4, s0
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[8:11], s0 offen			; GCN: s_addc_u32 s5, s5, 0

				; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[4:7], 0 offen
				; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[4:7], 0 offen

	; ALL: ; ScratchSize: 32772			; ALL: ; ScratchSize: 32772
	define amdgpu_ps void @large_alloca_pixel_shader(i32 %x, i32 %y) #0 {			define amdgpu_ps void @large_alloca_pixel_shader(i32 %x, i32 %y) #0 {
	%large = alloca [8192 x i32], align 4, addrspace(5)			%large = alloca [8192 x i32], align 4, addrspace(5)
	%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191			%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191
	store volatile i32 %x, i32 addrspace(5)* %gep			store volatile i32 %x, i32 addrspace(5)* %gep
	%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y			%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y
	%val = load volatile i32, i32 addrspace(5)* %gep1			%val = load volatile i32, i32 addrspace(5)* %gep1
	store volatile i32 %val, i32 addrspace(1)* undef			store volatile i32 %val, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; ALL-LABEL: {{^}}large_alloca_pixel_shader_inreg:			; ALL-LABEL: {{^}}large_alloca_pixel_shader_inreg:
	; GCN-DAG: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN-DAG: s_mov_b32 s9, SCRATCH_RSRC_DWORD1			; GCN-DAG: s_mov_b32 s5, SCRATCH_RSRC_DWORD1
	; GCN-DAG: s_mov_b32 s10, -1			; GCN-DAG: s_mov_b32 s6, -1
	; CI-DAG: s_mov_b32 s11, 0xe8f000
	; VI-DAG: s_mov_b32 s11, 0xe80000			; CI-DAG: s_mov_b32 s7, 0xe8f000
	; GFX9-DAG: s_mov_b32 s11, 0xe00000			; VI-DAG: s_mov_b32 s7, 0xe80000
				; GFX9-DAG: s_mov_b32 s7, 0xe00000

				; GCN: s_add_u32 s4, s4, s2
				; GCN: s_addc_u32 s5, s5, 0

	; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[8:11], s2 offen			; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[4:7], 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[8:11], s2 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[4:7], 0 offen

	; ALL: ; ScratchSize: 32772			; ALL: ; ScratchSize: 32772
	define amdgpu_ps void @large_alloca_pixel_shader_inreg(i32 inreg %x, i32 inreg %y) #0 {			define amdgpu_ps void @large_alloca_pixel_shader_inreg(i32 inreg %x, i32 inreg %y) #0 {
	%large = alloca [8192 x i32], align 4, addrspace(5)			%large = alloca [8192 x i32], align 4, addrspace(5)
	%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191			%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191
	store volatile i32 %x, i32 addrspace(5)* %gep			store volatile i32 %x, i32 addrspace(5)* %gep
	%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y			%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y
	%val = load volatile i32, i32 addrspace(5)* %gep1			%val = load volatile i32, i32 addrspace(5)* %gep1
	store volatile i32 %val, i32 addrspace(1)* undef			store volatile i32 %val, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.implicit.buffer.ptr.ll

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; FIXME: Requires stack object to not assert			; FIXME: Requires stack object to not assert
	; GCN-LABEL: {{^}}test_ps:			; GCN-LABEL: {{^}}test_ps:
	; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GCN: buffer_store_dword v0, off, s[4:7], s2 offset:4			; GCN: buffer_store_dword v0, off, s[4:7], 0 offset:4
	; GCN: s_load_dword s{{[0-9]+}}, s[0:1], 0x0			; GCN: s_load_dword s{{[0-9]+}}, s[0:1], 0x0
	; GCN-NEXT: s_waitcnt			; GCN-NEXT: s_waitcnt
	; GCN-NEXT: ; return			; GCN-NEXT: ; return
	define amdgpu_ps i32 @test_ps() #1 {			define amdgpu_ps i32 @test_ps() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	%implicit_buffer_ptr = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr()			%implicit_buffer_ptr = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr()
	%buffer_ptr = bitcast i8 addrspace(4)* %implicit_buffer_ptr to i32 addrspace(4)*			%buffer_ptr = bitcast i8 addrspace(4)* %implicit_buffer_ptr to i32 addrspace(4)*
	%value = load volatile i32, i32 addrspace(4)* %buffer_ptr			%value = load volatile i32, i32 addrspace(4)* %buffer_ptr
	ret i32 %value			ret i32 %value
	}			}

	; GCN-LABEL: {{^}}test_cs:			; GCN-LABEL: {{^}}test_cs:
	; GCN: s_mov_b64 s[4:5], s[0:1]			; GCN: s_mov_b64 s[4:5], s[0:1]
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], s2 offset:4			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], 0 offset:4
	; GCN: s_load_dword s0, s[0:1], 0x0			; GCN: s_load_dword s0, s[0:1], 0x0
	define amdgpu_cs i32 @test_cs() #1 {			define amdgpu_cs i32 @test_cs() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	%implicit_buffer_ptr = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr()			%implicit_buffer_ptr = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr()
	%buffer_ptr = bitcast i8 addrspace(4)* %implicit_buffer_ptr to i32 addrspace(4)*			%buffer_ptr = bitcast i8 addrspace(4)* %implicit_buffer_ptr to i32 addrspace(4)*
	%value = load volatile i32, i32 addrspace(4)* %buffer_ptr			%value = load volatile i32, i32 addrspace(4)* %buffer_ptr
	ret i32 %value			ret i32 %value
	}			}

	declare i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr() #0			declare i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr() #0

	attributes #0 = { nounwind readnone speculatable }			attributes #0 = { nounwind readnone speculatable }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }

llvm/test/CodeGen/AMDGPU/load-hi16.ll

Show First 20 Lines • Show All 525 Lines • ▼ Show 20 Lines	entry:
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %load, i32 1		%build1 = insertelement <2 x half> %build0, half %load, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_short_d16_hi v0, off, s[0:3], s33 offset:4094{{$}}		; GFX900: buffer_load_short_d16_hi v0, off, s[0:3], 0 offset:4094{{$}}
; GFX900: s_waitcnt		; GFX900: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff(i16 addrspace(5)* byval %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff(i16 addrspace(5)* byval %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_short_d16_hi v1, off, s[0:3], s33 offset:4094{{$}}		; GFX900-NEXT: buffer_load_short_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, half %reg) #0 {
entry:		entry:
%load = load volatile half, half addrspace(5)* inttoptr (i32 4094 to half addrspace(5)*)		%load = load volatile half, half addrspace(5)* inttoptr (i32 4094 to half addrspace(5)*)
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %load, i32 1		%build1 = insertelement <2 x half> %build0, half %load, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	entry:
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_zexti8:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_zexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], s33 offset:4094{{$}}		; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_sexti8:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_sexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_sbyte_d16_hi v1, off, s[0:3], s33 offset:4094{{$}}		; GFX900-NEXT: buffer_load_sbyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_sbyte v0, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff_zexti8:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff_zexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], s33 offset:4094{{$}}		; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, half %reg) #0 {
entry:		entry:
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%bc.ext = bitcast i16 %ext to half		%bc.ext = bitcast i16 %ext to half
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %bc.ext, i32 1		%build1 = insertelement <2 x half> %build0, half %bc.ext, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/load-lo16.ll

Show First 20 Lines • Show All 1,297 Lines • ▼ Show 20 Lines	entry:
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reghi_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reghi_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX900-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0		; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0
; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0		; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	entry:
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], 0 offset:4094
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_sbyte_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_sbyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_sbyte v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_sbyte v0, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0		; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0
; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0		; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], 0 offset:4094
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
▲ Show 20 Lines • Show All 445 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-load.ll

Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @wavefront_one_as_seq_cst(
i32* %in, i32* %out) {		i32* %in, i32* %out) {
entry:		entry:
%val = load atomic i32, i32* %in syncscope("wavefront-one-as") seq_cst, align 4		%val = load atomic i32, i32* %in syncscope("wavefront-one-as") seq_cst, align 4
store i32 %val, i32* %out		store i32 %val, i32* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}nontemporal_private_0:		; GCN-LABEL: {{^}}nontemporal_private_0:
; GFX89: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen glc slc{{$}}		; GFX89: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen glc slc{{$}}
; GFX10: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen slc{{$}}		; GFX10: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen slc{{$}}
; GFX10: .amdhsa_kernel nontemporal_private_0		; GFX10: .amdhsa_kernel nontemporal_private_0
; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0		; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
; GFX10CU: .amdhsa_workgroup_processor_mode 0		; GFX10CU: .amdhsa_workgroup_processor_mode 0
; GFX10-NOT: .amdhsa_memory_ordered 0		; GFX10-NOT: .amdhsa_memory_ordered 0
define amdgpu_kernel void @nontemporal_private_0(		define amdgpu_kernel void @nontemporal_private_0(
i32 addrspace(5)* %in, i32* %out) {		i32 addrspace(5)* %in, i32* %out) {
entry:		entry:
%val = load i32, i32 addrspace(5)* %in, align 4, !nontemporal !0		%val = load i32, i32 addrspace(5)* %in, align 4, !nontemporal !0
store i32 %val, i32* %out		store i32 %val, i32* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}nontemporal_private_1:		; GCN-LABEL: {{^}}nontemporal_private_1:
; GFX89: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen glc slc{{$}}		; GFX89: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen glc slc{{$}}
; GFX10: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen slc{{$}}		; GFX10: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen slc{{$}}
; GFX10: .amdhsa_kernel nontemporal_private_1		; GFX10: .amdhsa_kernel nontemporal_private_1
; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0		; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
; GFX10CU: .amdhsa_workgroup_processor_mode 0		; GFX10CU: .amdhsa_workgroup_processor_mode 0
; GFX10-NOT: .amdhsa_memory_ordered 0		; GFX10-NOT: .amdhsa_memory_ordered 0
define amdgpu_kernel void @nontemporal_private_1(		define amdgpu_kernel void @nontemporal_private_1(
i32 addrspace(5)* %in, i32* %out) {		i32 addrspace(5)* %in, i32* %out) {
entry:		entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
▲ Show 20 Lines • Show All 545 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-store.ll

	Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @wavefront_one_as_seq_cst(			define amdgpu_kernel void @wavefront_one_as_seq_cst(
	i32 %in, i32* %out) {			i32 %in, i32* %out) {
	entry:			entry:
	store atomic i32 %in, i32* %out syncscope("wavefront-one-as") seq_cst, align 4			store atomic i32 %in, i32* %out syncscope("wavefront-one-as") seq_cst, align 4
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}nontemporal_private_0:			; GCN-LABEL: {{^}}nontemporal_private_0:
	; GFX89: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen glc slc{{$}}			; GFX89: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen glc slc{{$}}
	; GFX10: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen slc{{$}}			; GFX10: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen slc{{$}}
	; GFX10: .amdhsa_kernel nontemporal_private_0			; GFX10: .amdhsa_kernel nontemporal_private_0
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @nontemporal_private_0(			define amdgpu_kernel void @nontemporal_private_0(
	i32* %in, i32 addrspace(5)* %out) {			i32* %in, i32 addrspace(5)* %out) {
	entry:			entry:
	%val = load i32, i32* %in, align 4			%val = load i32, i32* %in, align 4
	store i32 %val, i32 addrspace(5)* %out, !nontemporal !0			store i32 %val, i32 addrspace(5)* %out, !nontemporal !0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}nontemporal_private_1:			; GCN-LABEL: {{^}}nontemporal_private_1:
	; GFX89: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen glc slc{{$}}			; GFX89: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen glc slc{{$}}
	; GFX10: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen slc{{$}}			; GFX10: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen slc{{$}}
	; GFX10: .amdhsa_kernel nontemporal_private_1			; GFX10: .amdhsa_kernel nontemporal_private_1
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @nontemporal_private_1(			define amdgpu_kernel void @nontemporal_private_1(
	i32* %in, i32 addrspace(5)* %out) {			i32* %in, i32 addrspace(5)* %out) {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory_clause.ll

	Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines

	define void @mubuf_clause(<4 x i32> addrspace(5)* noalias nocapture readonly %arg, <4 x i32> addrspace(5)* noalias nocapture %arg1) {			define void @mubuf_clause(<4 x i32> addrspace(5)* noalias nocapture readonly %arg, <4 x i32> addrspace(5)* noalias nocapture %arg1) {
	; GCN-LABEL: mubuf_clause:			; GCN-LABEL: mubuf_clause:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v2			; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v2
	; GCN-NEXT: v_lshlrev_b32_e32 v2, 4, v2			; GCN-NEXT: v_lshlrev_b32_e32 v2, 4, v2
	; GCN-NEXT: v_add_u32_e32 v0, v0, v2			; GCN-NEXT: v_add_u32_e32 v0, v0, v2
	; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_nop 0
	; GCN-NEXT: buffer_load_dword v3, v0, s[0:3], s33 offen
	; GCN-NEXT: buffer_load_dword v4, v0, s[0:3], s33 offen offset:4
	; GCN-NEXT: buffer_load_dword v5, v0, s[0:3], s33 offen offset:8
	; GCN-NEXT: buffer_load_dword v6, v0, s[0:3], s33 offen offset:12
	; GCN-NEXT: buffer_load_dword v7, v0, s[0:3], s33 offen offset:16
	; GCN-NEXT: buffer_load_dword v8, v0, s[0:3], s33 offen offset:20
	; GCN-NEXT: buffer_load_dword v9, v0, s[0:3], s33 offen offset:24
	; GCN-NEXT: buffer_load_dword v10, v0, s[0:3], s33 offen offset:28
	; GCN-NEXT: buffer_load_dword v11, v0, s[0:3], s33 offen offset:32
	; GCN-NEXT: buffer_load_dword v12, v0, s[0:3], s33 offen offset:36
	; GCN-NEXT: buffer_load_dword v13, v0, s[0:3], s33 offen offset:40
	; GCN-NEXT: buffer_load_dword v14, v0, s[0:3], s33 offen offset:44
	; GCN-NEXT: buffer_load_dword v15, v0, s[0:3], s33 offen offset:48
	; GCN-NEXT: buffer_load_dword v16, v0, s[0:3], s33 offen offset:52
	; GCN-NEXT: buffer_load_dword v17, v0, s[0:3], s33 offen offset:56
	; GCN-NEXT: v_add_u32_e32 v1, v1, v2			; GCN-NEXT: v_add_u32_e32 v1, v1, v2
	; GCN-NEXT: s_nop 0			; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_nop 0			; GCN-NEXT: s_nop 0
	; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen offset:60			; GCN-NEXT: buffer_load_dword v6, v0, s[0:3], 0 offen offset:20
	; GCN-NEXT: s_nop 0			; GCN-NEXT: buffer_load_dword v7, v0, s[0:3], 0 offen offset:24
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v8, v0, s[0:3], 0 offen offset:28
	; GCN-NEXT: s_nop 0			; GCN-NEXT: buffer_load_dword v9, v0, s[0:3], 0 offen offset:32
	; GCN-NEXT: buffer_store_dword v3, v1, s[0:3], s33 offen			; GCN-NEXT: buffer_load_dword v10, v0, s[0:3], 0 offen offset:36
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v11, v0, s[0:3], 0 offen offset:40
	; GCN-NEXT: buffer_store_dword v4, v1, s[0:3], s33 offen offset:4			; GCN-NEXT: buffer_load_dword v12, v0, s[0:3], 0 offen offset:44
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v13, v0, s[0:3], 0 offen offset:48
	; GCN-NEXT: buffer_store_dword v5, v1, s[0:3], s33 offen offset:8			; GCN-NEXT: buffer_load_dword v14, v0, s[0:3], 0 offen offset:52
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v15, v0, s[0:3], 0 offen offset:56
	; GCN-NEXT: buffer_store_dword v6, v1, s[0:3], s33 offen offset:12			; GCN-NEXT: buffer_load_dword v16, v0, s[0:3], 0 offen offset:60
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen
	; GCN-NEXT: buffer_store_dword v7, v1, s[0:3], s33 offen offset:16			; GCN-NEXT: buffer_load_dword v3, v0, s[0:3], 0 offen offset:4
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v4, v0, s[0:3], 0 offen offset:8
	; GCN-NEXT: buffer_store_dword v8, v1, s[0:3], s33 offen offset:20			; GCN-NEXT: buffer_load_dword v5, v0, s[0:3], 0 offen offset:12
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: s_nop 0
	; GCN-NEXT: buffer_store_dword v9, v1, s[0:3], s33 offen offset:24			; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen offset:16
	; GCN-NEXT: buffer_store_dword v10, v1, s[0:3], s33 offen offset:28			; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: buffer_store_dword v11, v1, s[0:3], s33 offen offset:32			; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v2, v1, s[0:3], 0 offen
	; GCN-NEXT: buffer_store_dword v12, v1, s[0:3], s33 offen offset:36			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v3, v1, s[0:3], 0 offen offset:4
	; GCN-NEXT: buffer_store_dword v13, v1, s[0:3], s33 offen offset:40			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v4, v1, s[0:3], 0 offen offset:8
	; GCN-NEXT: buffer_store_dword v14, v1, s[0:3], s33 offen offset:44			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v5, v1, s[0:3], 0 offen offset:12
	; GCN-NEXT: buffer_store_dword v15, v1, s[0:3], s33 offen offset:48			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen offset:16
	; GCN-NEXT: buffer_store_dword v16, v1, s[0:3], s33 offen offset:52			; GCN-NEXT: buffer_store_dword v6, v1, s[0:3], 0 offen offset:20
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v7, v1, s[0:3], 0 offen offset:24
	; GCN-NEXT: buffer_store_dword v17, v1, s[0:3], s33 offen offset:56			; GCN-NEXT: buffer_store_dword v8, v1, s[0:3], 0 offen offset:28
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v9, v1, s[0:3], 0 offen offset:32
	; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], s33 offen offset:60			; GCN-NEXT: buffer_store_dword v10, v1, s[0:3], 0 offen offset:36
				; GCN-NEXT: buffer_store_dword v11, v1, s[0:3], 0 offen offset:40
				; GCN-NEXT: buffer_store_dword v12, v1, s[0:3], 0 offen offset:44
				; GCN-NEXT: buffer_store_dword v13, v1, s[0:3], 0 offen offset:48
				; GCN-NEXT: buffer_store_dword v14, v1, s[0:3], 0 offen offset:52
				; GCN-NEXT: buffer_store_dword v15, v1, s[0:3], 0 offen offset:56
				; GCN-NEXT: buffer_store_dword v16, v1, s[0:3], 0 offen offset:60
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp2 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(5)* %arg, i32 %tmp			%tmp2 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(5)* %arg, i32 %tmp
	%tmp3 = load <4 x i32>, <4 x i32> addrspace(5)* %tmp2, align 16			%tmp3 = load <4 x i32>, <4 x i32> addrspace(5)* %tmp2, align 16
	%tmp4 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(5)* %arg1, i32 %tmp			%tmp4 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(5)* %arg1, i32 %tmp
	%tmp5 = add nuw nsw i32 %tmp, 1			%tmp5 = add nuw nsw i32 %tmp, 1
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/mesa3d.ll

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

	; GCN-LABEL: {{^}}scratch_ps:			; GCN-LABEL: {{^}}scratch_ps:
	; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0{{$}}			; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0{{$}}
	; GCN-DAG: s_mov_b32 s6, -1{{$}}			; GCN-DAG: s_mov_b32 s6, -1{{$}}
	; GCN-DAG: s_mov_b32 s7, 0xe8f000			; GCN-DAG: s_mov_b32 s7, 0xe8f000
	; GCN-DAG: v_mov_b32_e32 [[V:v[0-9]+]], 2			; GCN-DAG: v_mov_b32_e32 [[V:v[0-9]+]], 2
	; GCN: buffer_store_dword [[V]], off, s[4:7], s2 offset:4			; GCN: buffer_store_dword [[V]], off, s[4:7], 0 offset:4
	define amdgpu_ps void @scratch_ps(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_ps void @scratch_ps(i32 addrspace(1)* %out, i32 %in) {
	entry:			entry:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 2, i32 addrspace(5)* %alloca			store volatile i32 2, i32 addrspace(5)* %alloca
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/mir-print-dead-csr-fi.mir

	Show All 9 Lines
	name: csr_sgpr			name: csr_sgpr
	tracksRegLiveness: true			tracksRegLiveness: true
	liveins:			liveins:
	- { reg: '$sgpr30_sgpr31' }			- { reg: '$sgpr30_sgpr31' }
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr4'
	frameOffsetReg: '$sgpr5'			frameOffsetReg: '$sgpr5'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr30_sgpr31			liveins: $sgpr30_sgpr31

	INLINEASM &"; clobber s42", 1, 12, implicit-def dead early-clobber $sgpr42			INLINEASM &"; clobber s42", 1, 12, implicit-def dead early-clobber $sgpr42
	S_SETPC_B64_return $sgpr30_sgpr31			S_SETPC_B64_return $sgpr30_sgpr31

	...			...

llvm/test/CodeGen/AMDGPU/misched-killflags.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs -run-pass=post-RA-sched -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs -run-pass=post-RA-sched -o - %s \| FileCheck %s
	# Make sure ScheduleDAGInstrs::fixupKills does not produce invalid kill flags.			# Make sure ScheduleDAGInstrs::fixupKills does not produce invalid kill flags.
	---			---
	name: func0			name: func0
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr7'
	frameOffsetReg: '$sgpr7'			frameOffsetReg: '$sgpr7'
	body: \|			body: \|
	bb.0:			bb.0:

	$sgpr33 = S_MOV_B32 $sgpr7			$sgpr33 = S_MOV_B32 $sgpr7
	$sgpr32 = S_MOV_B32 $sgpr33			$sgpr32 = S_MOV_B32 $sgpr33
	$sgpr10 = S_MOV_B32 5			$sgpr10 = S_MOV_B32 5
	$sgpr9 = S_MOV_B32 4			$sgpr9 = S_MOV_B32 4
	Show All 32 Lines

llvm/test/CodeGen/AMDGPU/mubuf-offset-private.ll

	; RUN: llc -march=amdgcn -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SICIVI %s			; RUN: llc -march=amdgcn -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SICIVI %s
	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SICIVI %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SICIVI %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

	; Test addressing modes when the scratch base is not a frame index.			; Test addressing modes when the scratch base is not a frame index.

	; GCN-LABEL: {{^}}store_private_offset_i8:			; GCN-LABEL: {{^}}store_private_offset_i8:
	; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_i8() #0 {			define amdgpu_kernel void @store_private_offset_i8() #0 {
	store volatile i8 5, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)			store volatile i8 5, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i16:			; GCN-LABEL: {{^}}store_private_offset_i16:
	; GCN: buffer_store_short v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_short v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_i16() #0 {			define amdgpu_kernel void @store_private_offset_i16() #0 {
	store volatile i16 5, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)			store volatile i16 5, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i32:			; GCN-LABEL: {{^}}store_private_offset_i32:
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_i32() #0 {			define amdgpu_kernel void @store_private_offset_i32() #0 {
	store volatile i32 5, i32 addrspace(5)* inttoptr (i32 8 to i32 addrspace(5)*)			store volatile i32 5, i32 addrspace(5)* inttoptr (i32 8 to i32 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_v2i32:			; GCN-LABEL: {{^}}store_private_offset_v2i32:
	; GCN: buffer_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_v2i32() #0 {			define amdgpu_kernel void @store_private_offset_v2i32() #0 {
	store volatile <2 x i32> <i32 5, i32 10>, <2 x i32> addrspace(5)* inttoptr (i32 8 to <2 x i32> addrspace(5)*)			store volatile <2 x i32> <i32 5, i32 10>, <2 x i32> addrspace(5)* inttoptr (i32 8 to <2 x i32> addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_v4i32:			; GCN-LABEL: {{^}}store_private_offset_v4i32:
	; GCN: buffer_store_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_v4i32() #0 {			define amdgpu_kernel void @store_private_offset_v4i32() #0 {
	store volatile <4 x i32> <i32 5, i32 10, i32 15, i32 0>, <4 x i32> addrspace(5)* inttoptr (i32 8 to <4 x i32> addrspace(5)*)			store volatile <4 x i32> <i32 5, i32 10, i32 15, i32 0>, <4 x i32> addrspace(5)* inttoptr (i32 8 to <4 x i32> addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i8:			; GCN-LABEL: {{^}}load_private_offset_i8:
	; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_i8() #0 {			define amdgpu_kernel void @load_private_offset_i8() #0 {
	%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)			%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}sextload_private_offset_i8:			; GCN-LABEL: {{^}}sextload_private_offset_i8:
	; GCN: buffer_load_sbyte v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_sbyte v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @sextload_private_offset_i8(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @sextload_private_offset_i8(i32 addrspace(1)* %out) #0 {
	%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)			%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)
	%sextload = sext i8 %load to i32			%sextload = sext i8 %load to i32
	store i32 %sextload, i32 addrspace(1)* undef			store i32 %sextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}zextload_private_offset_i8:			; GCN-LABEL: {{^}}zextload_private_offset_i8:
	; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @zextload_private_offset_i8(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @zextload_private_offset_i8(i32 addrspace(1)* %out) #0 {
	%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)			%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)
	%zextload = zext i8 %load to i32			%zextload = zext i8 %load to i32
	store i32 %zextload, i32 addrspace(1)* undef			store i32 %zextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i16:			; GCN-LABEL: {{^}}load_private_offset_i16:
	; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_i16() #0 {			define amdgpu_kernel void @load_private_offset_i16() #0 {
	%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)			%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}sextload_private_offset_i16:			; GCN-LABEL: {{^}}sextload_private_offset_i16:
	; GCN: buffer_load_sshort v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_sshort v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @sextload_private_offset_i16(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @sextload_private_offset_i16(i32 addrspace(1)* %out) #0 {
	%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)			%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)
	%sextload = sext i16 %load to i32			%sextload = sext i16 %load to i32
	store i32 %sextload, i32 addrspace(1)* undef			store i32 %sextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}zextload_private_offset_i16:			; GCN-LABEL: {{^}}zextload_private_offset_i16:
	; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @zextload_private_offset_i16(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @zextload_private_offset_i16(i32 addrspace(1)* %out) #0 {
	%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)			%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)
	%zextload = zext i16 %load to i32			%zextload = zext i16 %load to i32
	store i32 %zextload, i32 addrspace(1)* undef			store i32 %zextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i32:			; GCN-LABEL: {{^}}load_private_offset_i32:
	; GCN: buffer_load_dword v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_dword v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_i32() #0 {			define amdgpu_kernel void @load_private_offset_i32() #0 {
	%load = load volatile i32, i32 addrspace(5)* inttoptr (i32 8 to i32 addrspace(5)*)			%load = load volatile i32, i32 addrspace(5)* inttoptr (i32 8 to i32 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_v2i32:			; GCN-LABEL: {{^}}load_private_offset_v2i32:
	; GCN: buffer_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_v2i32() #0 {			define amdgpu_kernel void @load_private_offset_v2i32() #0 {
	%load = load volatile <2 x i32>, <2 x i32> addrspace(5)* inttoptr (i32 8 to <2 x i32> addrspace(5)*)			%load = load volatile <2 x i32>, <2 x i32> addrspace(5)* inttoptr (i32 8 to <2 x i32> addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_v4i32:			; GCN-LABEL: {{^}}load_private_offset_v4i32:
	; GCN: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_v4i32() #0 {			define amdgpu_kernel void @load_private_offset_v4i32() #0 {
	%load = load volatile <4 x i32>, <4 x i32> addrspace(5)* inttoptr (i32 8 to <4 x i32> addrspace(5)*)			%load = load volatile <4 x i32>, <4 x i32> addrspace(5)* inttoptr (i32 8 to <4 x i32> addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset:
	; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], s2 offset:4095			; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], 0 offset:4095
	define amdgpu_kernel void @store_private_offset_i8_max_offset() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset() #0 {
	store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4095 to i8 addrspace(5)*)			store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4095 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus1:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus1:
	; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000			; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000
	; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], s2 offen{{$}}			; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], 0 offen{{$}}
	define amdgpu_kernel void @store_private_offset_i8_max_offset_plus1() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset_plus1() #0 {
	store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4096 to i8 addrspace(5)*)			store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4096 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus2:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus2:
	; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000			; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000
	; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], s2 offen offset:1{{$}}			; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], 0 offen offset:1{{$}}
	define amdgpu_kernel void @store_private_offset_i8_max_offset_plus2() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset_plus2() #0 {
	store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4097 to i8 addrspace(5)*)			store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4097 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; MUBUF used for stack access has bounds checking enabled before gfx9,			; MUBUF used for stack access has bounds checking enabled before gfx9,
	; so a possibly negative base index can't be used for the vgpr offset.			; so a possibly negative base index can't be used for the vgpr offset.

	; GCN-LABEL: {{^}}store_private_unknown_bits_vaddr:			; GCN-LABEL: {{^}}store_private_unknown_bits_vaddr:
	; SICIVI: v_add_{{i\|u}}32_e32 [[ADDR0:v[0-9]+]], vcc, 4			; SICIVI: v_add_{{i\|u}}32_e32 [[ADDR0:v[0-9]+]], vcc, 4
	; SICIVI: v_add_{{i\|u}}32_e32 [[ADDR1:v[0-9]+]], vcc, 32, [[ADDR0]]			; SICIVI: v_add_{{i\|u}}32_e32 [[ADDR1:v[0-9]+]], vcc, 32, [[ADDR0]]
	; SICIVI: buffer_store_dword v{{[0-9]+}}, [[ADDR1]], s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}			; SICIVI: buffer_store_dword v{{[0-9]+}}, [[ADDR1]], s{{\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}

	; GFX9: v_add_u32_e32 [[ADDR:v[0-9]+]], 4,			; GFX9: v_add_u32_e32 [[ADDR:v[0-9]+]], 4,
	; GFX9: buffer_store_dword v{{[0-9]+}}, [[ADDR]], s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen offset:32			; GFX9: buffer_store_dword v{{[0-9]+}}, [[ADDR]], s{{\[[0-9]+:[0-9]+\]}}, 0 offen offset:32
	define amdgpu_kernel void @store_private_unknown_bits_vaddr() #0 {			define amdgpu_kernel void @store_private_unknown_bits_vaddr() #0 {
	%alloca = alloca [16 x i32], align 4, addrspace(5)			%alloca = alloca [16 x i32], align 4, addrspace(5)
	%vaddr = load volatile i32, i32 addrspace(1)* undef			%vaddr = load volatile i32, i32 addrspace(1)* undef
	%vaddr.off = add i32 %vaddr, 8			%vaddr.off = add i32 %vaddr, 8
	%gep = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %vaddr.off			%gep = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %vaddr.off
	store volatile i32 9, i32 addrspace(5)* %gep			store volatile i32 9, i32 addrspace(5)* %gep
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-mesa-mesa3d -run-pass=si-optimize-exec-masking-pre-ra -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s			# RUN: llc -mtriple=amdgcn-mesa-mesa3d -run-pass=si-optimize-exec-masking-pre-ra -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

	# Check for regression from assuming an instruction was a copy after			# Check for regression from assuming an instruction was a copy after
	# dropping the opcode check.			# dropping the opcode check.
	---			---
	name: exec_src1_is_not_copy			name: exec_src1_is_not_copy
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr101'
	frameOffsetReg: '$sgpr101'			frameOffsetReg: '$sgpr101'
	body: \|			body: \|
	; GCN-LABEL: name: exec_src1_is_not_copy			; GCN-LABEL: name: exec_src1_is_not_copy
	; GCN: bb.0:			; GCN: bb.0:
	; GCN: successors: %bb.1(0x40000000), %bb.2(0x40000000)			; GCN: successors: %bb.1(0x40000000), %bb.2(0x40000000)
	; GCN: liveins: $vgpr0			; GCN: liveins: $vgpr0
	; GCN: [[COPY:%[0-9]+]]:sreg_64 = COPY $exec			; GCN: [[COPY:%[0-9]+]]:sreg_64 = COPY $exec
	; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF			; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

	Show First 20 Lines • Show All 406 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_writelane_b32 v0, s13, 57			; GCN-NEXT: v_writelane_b32 v0, s13, 57
	; GCN-NEXT: v_writelane_b32 v0, s14, 58			; GCN-NEXT: v_writelane_b32 v0, s14, 58
	; GCN-NEXT: v_writelane_b32 v0, s15, 59			; GCN-NEXT: v_writelane_b32 v0, s15, 59
	; GCN-NEXT: v_writelane_b32 v0, s16, 60			; GCN-NEXT: v_writelane_b32 v0, s16, 60
	; GCN-NEXT: v_writelane_b32 v0, s17, 61			; GCN-NEXT: v_writelane_b32 v0, s17, 61
	; GCN-NEXT: v_writelane_b32 v0, s18, 62			; GCN-NEXT: v_writelane_b32 v0, s18, 62
	; GCN-NEXT: v_writelane_b32 v0, s19, 63			; GCN-NEXT: v_writelane_b32 v0, s19, 63

	; GCN: v_readlane_b32 s4, v0, 48			; GCN: v_readlane_b32 s0, v0, 48
	; GCN-NEXT: v_readlane_b32 s5, v0, 49			; GCN-NEXT: v_readlane_b32 s1, v0, 49
	; GCN-NEXT: v_readlane_b32 s6, v0, 50			; GCN-NEXT: v_readlane_b32 s2, v0, 50
	; GCN-NEXT: v_readlane_b32 s7, v0, 51			; GCN-NEXT: v_readlane_b32 s3, v0, 51
	; GCN-NEXT: v_readlane_b32 s8, v0, 52			; GCN-NEXT: v_readlane_b32 s4, v0, 52
	; GCN-NEXT: v_readlane_b32 s9, v0, 53			; GCN-NEXT: v_readlane_b32 s5, v0, 53
	; GCN-NEXT: v_readlane_b32 s10, v0, 54			; GCN-NEXT: v_readlane_b32 s6, v0, 54
	; GCN-NEXT: v_readlane_b32 s11, v0, 55			; GCN-NEXT: v_readlane_b32 s7, v0, 55
	; GCN-NEXT: v_readlane_b32 s12, v0, 56			; GCN-NEXT: v_readlane_b32 s8, v0, 56
	; GCN-NEXT: v_readlane_b32 s13, v0, 57			; GCN-NEXT: v_readlane_b32 s9, v0, 57
	; GCN-NEXT: v_readlane_b32 s14, v0, 58			; GCN-NEXT: v_readlane_b32 s10, v0, 58
	; GCN-NEXT: v_readlane_b32 s15, v0, 59			; GCN-NEXT: v_readlane_b32 s11, v0, 59
	; GCN-NEXT: v_readlane_b32 s16, v0, 60			; GCN-NEXT: v_readlane_b32 s12, v0, 60
	; GCN-NEXT: v_readlane_b32 s17, v0, 61			; GCN-NEXT: v_readlane_b32 s13, v0, 61
	; GCN-NEXT: v_readlane_b32 s18, v0, 62			; GCN-NEXT: v_readlane_b32 s14, v0, 62
	; GCN-NEXT: v_readlane_b32 s19, v0, 63			; GCN-NEXT: v_readlane_b32 s15, v0, 63
				; GCN: use s[0:15]
	define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {			define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {
	%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
	%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0

	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 43			; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 43
	; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 44			; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 44
	; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 45			; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 45
	; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 46			; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 46
	; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 47			; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 47
	; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 48			; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 48
	; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 49			; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 49

	; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}			; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}			; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0
	; GCN: s_cbranch_scc1			; GCN: s_cbranch_scc1


	; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 0			; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 0
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 1			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 1
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 2			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 2
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 3			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 3
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 4			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 4
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 26			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 26
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 27			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 27
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 28			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 28
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 29			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 29
	; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 30			; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 30
	; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 31			; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 31
	; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}			; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

	; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}			; GCN: buffer_load_dword [[V_TMP:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0
	; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}			; GCN: v_readfirstlane_b32 s[[USE_TMP_LO:[0-9]+]], [[V_TMP]]
				; GCN: buffer_load_dword [[V_TMP:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0
	; GCN: v_readfirstlane_b32 s1, v0			; GCN: v_readfirstlane_b32 s[[USE_TMP_HI:[0-9]+]], [[V_TMP]]
	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: ; use s[0:1]			; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}
	define amdgpu_kernel void @no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {			define amdgpu_kernel void @no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {
	call void asm sideeffect "", "~{v[0:7]}" () #0			call void asm sideeffect "", "~{v[0:7]}" () #0
	call void asm sideeffect "", "~{v[8:15]}" () #0			call void asm sideeffect "", "~{v[8:15]}" () #0
	call void asm sideeffect "", "~{v[16:19]}"() #0			call void asm sideeffect "", "~{v[16:19]}"() #0
	call void asm sideeffect "", "~{v[20:21]}"() #0			call void asm sideeffect "", "~{v[20:21]}"() #0
	call void asm sideeffect "", "~{v22}"() #0			call void asm sideeffect "", "~{v22}"() #0

	%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0			%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/pei-reg-scavenger-position.mir

	Show All 11 Lines

	# Force a frame larger than the immediate field with a large alignment.			# Force a frame larger than the immediate field with a large alignment.
	stack:			stack:
	- { id: 0, type: default, offset: 4096, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 4096, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr33
	frameOffsetReg: $sgpr5
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
				argumentInfo:
				privateSegmentWaveByteOffset: { reg: '$sgpr4' }

	body: \|			body: \|
	; CHECK-LABEL: name: scavenge_register_position			; CHECK-LABEL: name: scavenge_register_position
	; CHECK: bb.0:			; CHECK: bb.0:
	; CHECK: successors: %bb.1(0x80000000)			; CHECK: successors: %bb.1(0x80000000)
	; CHECK: liveins: $sgpr33, $sgpr0_sgpr1_sgpr2_sgpr3			; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4
	; CHECK: $sgpr4 = S_ADD_U32 $sgpr32, 524288, implicit-def $scc			; CHECK: $sgpr0 = S_ADD_U32 $sgpr0, killed $sgpr4, implicit-def $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
				; CHECK: $sgpr1 = S_ADDC_U32 $sgpr1, 0, implicit-def $scc, implicit $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
				; CHECK: $sgpr4 = S_MOV_B32 524288
	; CHECK: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.0, align 8192, addrspace 5)			; CHECK: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.0, align 8192, addrspace 5)
	; CHECK: S_BRANCH %bb.1			; CHECK: S_BRANCH %bb.1
	; CHECK: bb.1:			; CHECK: bb.1:
	; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3			; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
	; CHECK: $sgpr4 = S_ADD_U32 $sgpr32, 524288, implicit-def $scc			; CHECK: $sgpr4 = S_MOV_B32 524288
	; CHECK: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.0, align 8192, addrspace 5)			; CHECK: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.0, align 8192, addrspace 5)
	; CHECK: S_ENDPGM 0, implicit $vgpr0			; CHECK: S_ENDPGM 0, implicit $vgpr0
	bb.0:			bb.0:
	$vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			$vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	$vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			$vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	S_ENDPGM 0, implicit $vgpr0			S_ENDPGM 0, implicit $vgpr0
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-carry-out.mir

	Show All 13 Lines

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34			frameOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs			; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr34
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr34 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr33 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $sgpr34 = S_LSHR_B32 $sgpr34, 6, implicit-def $scc
	; CHECK: $sgpr33 = S_LSHR_B32 killed $sgpr33, 6, implicit-def $scc			; CHECK: $sgpr34 = S_ADD_U32 killed $sgpr34, 8192, implicit-def $scc
	; CHECK: $sgpr33 = S_ADD_U32 killed $sgpr33, 8192, implicit-def $scc			; CHECK: $vgpr2 = COPY killed $sgpr34
	; CHECK: $vgpr2 = COPY killed $sgpr33			; CHECK: $sgpr34 = S_SUB_U32 killed $sgpr34, 8192, implicit-def $scc
	; CHECK: $sgpr33 = S_SUB_U32 killed $sgpr33, 8192, implicit-def $scc			; CHECK: $sgpr34 = S_LSHL_B32 $sgpr34, 6, implicit-def $scc
	; CHECK: $sgpr33 = S_LSHL_B32 killed $sgpr33, 6, implicit-def $scc
	; CHECK: $sgpr33 = S_ADD_U32 $sgpr33, $sgpr34, implicit-def $scc
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr34 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

	# One 32-bit SGPR is available for the intermediate scale computation,			# One 32-bit SGPR is available for the intermediate scale computation,
	# so only an extra copy to VALU is necessary.			# so only an extra copy to VALU is necessary.

	---			---
	name: scavenge_sgpr_pei_one_sgpr			name: scavenge_sgpr_pei_one_sgpr
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34			frameOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr			; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr34
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr34 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr29 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $sgpr29 = S_LSHR_B32 $sgpr34, 6, implicit-def $scc
	; CHECK: $sgpr29 = S_LSHR_B32 killed $sgpr29, 6, implicit-def $scc
	; CHECK: $sgpr29 = S_ADD_U32 killed $sgpr29, 8192, implicit-def $scc			; CHECK: $sgpr29 = S_ADD_U32 killed $sgpr29, 8192, implicit-def $scc
	; CHECK: $vgpr2 = COPY killed $sgpr29			; CHECK: $vgpr2 = COPY killed $sgpr29
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr34 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

	# When only one 64-bit SGPR is available for the unused carry out pre gfx9,			# When only one 64-bit SGPR is available for the unused carry out pre gfx9,
	# we must reuse one of the 32-bit SGPR sub-regs to materialize the offset.			# we must reuse one of the 32-bit SGPR sub-regs to materialize the offset.

	---			---
	name: scavenge_sgpr_pei_one_sgpr_64			name: scavenge_sgpr_pei_one_sgpr_64
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34			frameOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr_64			; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr_64
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr34
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr34 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr28 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr34, implicit $exec
	; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, killed $sgpr28, implicit $exec
	; CHECK: $sgpr28 = S_MOV_B32 8192			; CHECK: $sgpr28 = S_MOV_B32 8192
	; CHECK: $vgpr2, dead $sgpr28_sgpr29 = V_ADD_I32_e64 killed $sgpr28, killed $vgpr3, 0, implicit $exec			; CHECK: $vgpr2, dead $sgpr28_sgpr29 = V_ADD_I32_e64 killed $sgpr28, killed $vgpr3, 0, implicit $exec
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr34 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

	# Prefer to use vcc as unused carry out.			# Prefer to use vcc as unused carry out.

	---			---
	name: scavenge_sgpr_pei_prefer_vcc			name: scavenge_sgpr_pei_prefer_vcc
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34			frameOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_prefer_vcc			; CHECK-LABEL: name: scavenge_sgpr_pei_prefer_vcc
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr34
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr34 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31
	; CHECK: $vcc_hi = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr34, implicit $exec
	; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, killed $vcc_hi, implicit $exec
	; CHECK: $vcc_lo = S_MOV_B32 8192			; CHECK: $vcc_lo = S_MOV_B32 8192
	; CHECK: $vgpr2, dead $vcc = V_ADD_I32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec			; CHECK: $vgpr2, dead $vcc = V_ADD_I32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr34 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0			; CHECK: S_ENDPGM 0
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31
	S_ENDPGM 0			S_ENDPGM 0
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s

	# Test what happens when an SGPR is unavailable for the unused add. The non-inline constant needs to be folded into the add instruction and not materialized in a register.			# Test what happens when an SGPR is unavailable for the unused add. The non-inline constant needs to be folded into the add instruction and not materialized in a register.

	---			---
	name: scavenge_sgpr_pei_no_sgprs			name: scavenge_sgpr_pei_no_sgprs
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33			frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs			; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr33
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr33 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, killed $sgpr33, implicit $exec
	; CHECK: $vgpr2 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec			; CHECK: $vgpr2 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec
	; CHECK: $sgpr33 = S_ADD_U32 $sgpr33, $sgpr34, implicit-def $scc
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr33 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s

	# Frame virtual SGPRs should not be used, as the register scavenger cannot usefully spill them anymore.			# Frame virtual SGPRs should not be used, as the register scavenger cannot usefully spill them anymore.
	# Spilling is also worse than increment and restore of a frame register. There should be no spills remaining.			# Spilling is also worse than increment and restore of a frame register. There should be no spills remaining.

	---			---
	name: scavenge_sgpr_pei			name: scavenge_sgpr_pei
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, size: 4, alignment: 4096 }			- { id: 0, type: default, size: 4, alignment: 4096 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33			frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei			; CHECK-LABEL: name: scavenge_sgpr_pei
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr33
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 262080, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 262080, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294705152, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294705152, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 524288, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 524288, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr33 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc
	; CHECK: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; CHECK: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $sgpr33 = S_ADD_U32 $sgpr33, $sgpr34, implicit-def $scc
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 524288, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 524288, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr33 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.0, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.0, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

llvm/test/CodeGen/AMDGPU/private-access-no-objects.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI -check-prefix=OPT %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=iceland -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=iceland -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s
	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=OPTNONE %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=OPTNONE %s

	; There are no stack objects, but still a private memory access. The			; There are no stack objects, but still a private memory access. The
	; private access regiters need to be correctly initialized anyway, and			; private access regiters need to be correctly initialized anyway, and
	; shifted down to the end of the used registers.			; shifted down to the end of the used registers.

	; GCN-LABEL: {{^}}store_to_undef:			; GCN-LABEL: {{^}}store_to_undef:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s5{{$}}			; OPT: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, 0 offen{{$}}
	; OPT: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offen{{$}}

	; -O0 should assume spilling, so the input scratch resource descriptor			; -O0 should assume spilling, so the input scratch resource descriptor
	; -should be used directly without any copies.			; -should be used directly without any copies.

	; OPTNONE-NOT: s_mov_b32			; OPTNONE-NOT: s_mov_b32
	; OPTNONE: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s5 offen{{$}}			; OPTNONE: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	define amdgpu_kernel void @store_to_undef() #0 {			define amdgpu_kernel void @store_to_undef() #0 {
	store volatile i32 0, i32 addrspace(5)* undef			store volatile i32 0, i32 addrspace(5)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_to_inttoptr:			; GCN-LABEL: {{^}}store_to_inttoptr:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s5{{$}}			; OPT: buffer_store_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, 0 offset:124{{$}}
	; OPT: buffer_store_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offset:124{{$}}
	define amdgpu_kernel void @store_to_inttoptr() #0 {			define amdgpu_kernel void @store_to_inttoptr() #0 {
	store volatile i32 0, i32 addrspace(5)* inttoptr (i32 124 to i32 addrspace(5)*)			store volatile i32 0, i32 addrspace(5)* inttoptr (i32 124 to i32 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_from_undef:			; GCN-LABEL: {{^}}load_from_undef:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s5{{$}}			; OPT: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, 0 offen{{$}}
	; OPT: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offen{{$}}
	define amdgpu_kernel void @load_from_undef() #0 {			define amdgpu_kernel void @load_from_undef() #0 {
	%ld = load volatile i32, i32 addrspace(5)* undef			%ld = load volatile i32, i32 addrspace(5)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_from_inttoptr:			; GCN-LABEL: {{^}}load_from_inttoptr:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s5{{$}}			; OPT: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, 0 offset:124{{$}}
	; OPT: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offset:124{{$}}
	define amdgpu_kernel void @load_from_inttoptr() #0 {			define amdgpu_kernel void @load_from_inttoptr() #0 {
	%ld = load volatile i32, i32 addrspace(5)* inttoptr (i32 124 to i32 addrspace(5)*)			%ld = load volatile i32, i32 addrspace(5)* inttoptr (i32 124 to i32 addrspace(5)*)
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/private-element-size.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-16 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT16 -check-prefix=HSA -check-prefix=HSA-ELT16 -check-prefix=ALL -check-prefix=HSA_ELTGE8 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-16 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT16 -check-prefix=HSA -check-prefix=HSA-ELT16 -check-prefix=ALL -check-prefix=HSA_ELTGE8 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-8 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT8 -check-prefix=HSA -check-prefix=HSA-ELT8 -check-prefix=ALL -check-prefix=HSA-ELTGE8 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-8 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT8 -check-prefix=HSA -check-prefix=HSA-ELT8 -check-prefix=ALL -check-prefix=HSA-ELTGE8 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-4 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT4 -check-prefix=HSA -check-prefix=HSA-ELT4 -check-prefix=ALL %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-4 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT4 -check-prefix=HSA -check-prefix=HSA-ELT4 -check-prefix=ALL %s


	; ALL-LABEL: {{^}}private_elt_size_v4i32:			; ALL-LABEL: {{^}}private_elt_size_v4i32:

	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1


	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}

	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:40			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:40

	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen
	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:20{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:20{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:28{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:32{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:32{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:36{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:36{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:40{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:40{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:44{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:44{{$}}

	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:8{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:12{{$}}
	define amdgpu_kernel void @private_elt_size_v4i32(<4 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_v4i32(<4 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x <4 x i32>], align 16, addrspace(5)			%alloca = alloca [2 x <4 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 0
	%gep1 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 1			%gep1 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 1
	store <4 x i32> zeroinitializer, <4 x i32> addrspace(5)* %gep0			store <4 x i32> zeroinitializer, <4 x i32> addrspace(5)* %gep0
	store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep1			store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep1
	%gep2 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 %index			%gep2 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 %index
	%load = load <4 x i32>, <4 x i32> addrspace(5)* %gep2			%load = load <4 x i32>, <4 x i32> addrspace(5)* %gep2
	store <4 x i32> %load, <4 x i32> addrspace(1)* %out			store <4 x i32> %load, <4 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

	; ALL-LABEL: {{^}}private_elt_size_v8i32:			; ALL-LABEL: {{^}}private_elt_size_v8i32:
	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1

	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:48			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:48
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:64			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:64
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:80			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:80

	; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}


	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:40			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:40
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:48			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:48
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:56			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:56
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:88			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:88
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:80			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:80
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:72			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:72
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:64			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:64

	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen
	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:32{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:32{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:36{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:36{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:40{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:40{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:44{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:44{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:48{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:48{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:52{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:52{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:56{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:56{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:60{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:60{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:64{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:64{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:68{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:68{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:72{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:72{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:76{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:76{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:80{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:80{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:84{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:84{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:88{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:88{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:92{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:92{{$}}

	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:8{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:12{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:16{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:16{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:20{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:20{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:24{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:24{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:28{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:28{{$}}
	define amdgpu_kernel void @private_elt_size_v8i32(<8 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_v8i32(<8 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x <8 x i32>], align 16, addrspace(5)			%alloca = alloca [2 x <8 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 0
	%gep1 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 1			%gep1 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 1
	store <8 x i32> zeroinitializer, <8 x i32> addrspace(5)* %gep0			store <8 x i32> zeroinitializer, <8 x i32> addrspace(5)* %gep0
	store <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>, <8 x i32> addrspace(5)* %gep1			store <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>, <8 x i32> addrspace(5)* %gep1
	%gep2 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 %index			%gep2 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 %index
	%load = load <8 x i32>, <8 x i32> addrspace(5)* %gep2			%load = load <8 x i32>, <8 x i32> addrspace(5)* %gep2
	store <8 x i32> %load, <8 x i32> addrspace(1)* %out			store <8 x i32> %load, <8 x i32> addrspace(1)* %out
	ret void			ret void
	}			}


	; ALL-LABEL: {{^}}private_elt_size_i64:			; ALL-LABEL: {{^}}private_elt_size_i64:
	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1

	; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off\|v[0-9]}}, s[0:3], s9 offset:1			; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off\|v[0-9]}}, s[0:3], 0 offset:1
	; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off\|v[0-9]}}, s[0:3], s9 offset:2			; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off\|v[0-9]}}, s[0:3], 0 offset:2

	; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:20{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:20{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:28{{$}}

	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	define amdgpu_kernel void @private_elt_size_i64(i64 addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_i64(i64 addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x i64], align 16, addrspace(5)			%alloca = alloca [2 x i64], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 0
	%gep1 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 1			%gep1 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 1
	store i64 0, i64 addrspace(5)* %gep0			store i64 0, i64 addrspace(5)* %gep0
	store i64 34359738602, i64 addrspace(5)* %gep1			store i64 34359738602, i64 addrspace(5)* %gep1
	%gep2 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 %index			%gep2 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 %index
	%load = load i64, i64 addrspace(5)* %gep2			%load = load i64, i64 addrspace(5)* %gep2
	store i64 %load, i64 addrspace(1)* %out			store i64 %load, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; ALL-LABEL: {{^}}private_elt_size_f64:			; ALL-LABEL: {{^}}private_elt_size_f64:
	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1

	; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16			; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16
	; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:24			; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:24

	; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:20{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:20{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:28{{$}}

	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	define amdgpu_kernel void @private_elt_size_f64(double addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_f64(double addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x double], align 16, addrspace(5)			%alloca = alloca [2 x double], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 0
	%gep1 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 1			%gep1 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 1
	store double 0.0, double addrspace(5)* %gep0			store double 0.0, double addrspace(5)* %gep0
	store double 4.0, double addrspace(5)* %gep1			store double 4.0, double addrspace(5)* %gep1
	%gep2 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 %index			%gep2 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 %index
	%load = load double, double addrspace(5)* %gep2			%load = load double, double addrspace(5)* %gep2
	store double %load, double addrspace(1)* %out			store double %load, double addrspace(1)* %out
	ret void			ret void
	}			}

	; ALL-LABEL: {{^}}private_elt_size_v2i64:			; ALL-LABEL: {{^}}private_elt_size_v2i64:
	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1

	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}

	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:24			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:24
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:40			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:40
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32

	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen
	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:20{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:20{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:28{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:32{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:32{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:36{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:36{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:40{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:40{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:44{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:44{{$}}

	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:8{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:12{{$}}
	define amdgpu_kernel void @private_elt_size_v2i64(<2 x i64> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_v2i64(<2 x i64> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x <2 x i64>], align 16, addrspace(5)			%alloca = alloca [2 x <2 x i64>], align 16, addrspace(5)
	Show All 14 Lines

llvm/test/CodeGen/AMDGPU/rename-independent-subregs-mac-operands.mir

	# RUN: llc -march=amdgcn -verify-machineinstrs -run-pass=simple-register-coalescing,rename-independent-subregs -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -verify-machineinstrs -run-pass=simple-register-coalescing,rename-independent-subregs -o - %s \| FileCheck -check-prefix=GCN %s
	---			---

	# GCN-LABEL: name: mac_invalid_operands			# GCN-LABEL: name: mac_invalid_operands
	# GCN: undef %18.sub0:vreg_128 = V_MAC_F32_e32 undef %3:vgpr_32, undef %9:vgpr_32, undef %18.sub0, implicit $exec			# GCN: undef %18.sub0:vreg_128 = V_MAC_F32_e32 undef %3:vgpr_32, undef %9:vgpr_32, undef %18.sub0, implicit $exec

	name: mac_invalid_operands			name: mac_invalid_operands
	alignment: 1			alignment: 1
	exposesReturnsTwice: false			exposesReturnsTwice: false
	legalized: false			legalized: false
	regBankSelected: false			regBankSelected: false
	selected: false			selected: false
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr4'
	frameOffsetReg: '$sgpr4'			frameOffsetReg: '$sgpr4'

	registers:			registers:
	- { id: 0, class: vreg_128 }			- { id: 0, class: vreg_128 }
	- { id: 1, class: vreg_128 }			- { id: 1, class: vreg_128 }
	- { id: 2, class: sgpr_64 }			- { id: 2, class: sgpr_64 }
	- { id: 3, class: vgpr_32 }			- { id: 3, class: vgpr_32 }
	- { id: 4, class: vgpr_32 }			- { id: 4, class: vgpr_32 }
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	alignment: 1			alignment: 1
	exposesReturnsTwice: false			exposesReturnsTwice: false
	legalized: false			legalized: false
	regBankSelected: false			regBankSelected: false
	selected: false			selected: false
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr4'
	frameOffsetReg: '$sgpr4'			frameOffsetReg: '$sgpr4'
	registers:			registers:
	- { id: 0, class: vgpr_32, preferred-register: '' }			- { id: 0, class: vgpr_32, preferred-register: '' }
	- { id: 1, class: vgpr_32, preferred-register: '' }			- { id: 1, class: vgpr_32, preferred-register: '' }
	- { id: 2, class: vgpr_32, preferred-register: '' }			- { id: 2, class: vgpr_32, preferred-register: '' }
	- { id: 3, class: vgpr_32, preferred-register: '' }			- { id: 3, class: vgpr_32, preferred-register: '' }
	- { id: 4, class: vgpr_32, preferred-register: '' }			- { id: 4, class: vgpr_32, preferred-register: '' }
	- { id: 5, class: sreg_64, preferred-register: '' }			- { id: 5, class: sreg_64, preferred-register: '' }
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sched-assert-dead-def-subreg-use-other-subreg.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs -run-pass=machine-scheduler -verify-misched -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs -run-pass=machine-scheduler -verify-misched -o - %s \| FileCheck %s

	# This would assert that a dead def should have no uses, but the dead			# This would assert that a dead def should have no uses, but the dead
	# def and use have different subreg indices.			# def and use have different subreg indices.

	---			---
	name: multi_def_dead_reg_subreg_check			name: multi_def_dead_reg_subreg_check
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr24_sgpr25_sgpr26_sgpr27'			scratchRSrcReg: '$sgpr24_sgpr25_sgpr26_sgpr27'
	scratchWaveOffsetReg: '$sgpr32'
	frameOffsetReg: '$sgpr32'			frameOffsetReg: '$sgpr32'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	argumentInfo:			argumentInfo:
	privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	privateSegmentWaveByteOffset: { reg: '$sgpr33' }			privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	body: \|			body: \|
	; CHECK-LABEL: name: multi_def_dead_reg_subreg_check			; CHECK-LABEL: name: multi_def_dead_reg_subreg_check
	; CHECK: bb.0:			; CHECK: bb.0:
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-subreg-def-across-subreg-def.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs -verify-misched -run-pass=machine-scheduler -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs -verify-misched -run-pass=machine-scheduler -o - %s \| FileCheck %s

	---			---
	name: handleMoveUp_incorrect_interval			name: handleMoveUp_incorrect_interval
	tracksRegLiveness: true			tracksRegLiveness: true
	liveins:			liveins:
	- { reg: '$sgpr4_sgpr5', virtual-reg: '%0' }			- { reg: '$sgpr4_sgpr5', virtual-reg: '%0' }
	frameInfo:			frameInfo:
	maxAlignment: 1			maxAlignment: 1
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr101'
	frameOffsetReg: '$sgpr101'			frameOffsetReg: '$sgpr101'
	stackPtrOffsetReg: '$sgpr101'			stackPtrOffsetReg: '$sgpr101'
	argumentInfo:			argumentInfo:
	privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	workGroupIDX: { reg: '$sgpr6' }			workGroupIDX: { reg: '$sgpr6' }
	privateSegmentWaveByteOffset: { reg: '$sgpr7' }			privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	workItemIDX: { reg: '$vgpr0' }			workItemIDX: { reg: '$vgpr0' }
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/scratch-buffer.ll

; RUN: llc -amdgpu-scalarize-global-loads=false -verify-machineinstrs -march=amdgcn < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s		; RUN: llc -amdgpu-scalarize-global-loads=false -verify-machineinstrs -march=amdgcn < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
; RUN: llc -amdgpu-scalarize-global-loads=false -verify-machineinstrs -march=amdgcn -mcpu=tonga < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s		; RUN: llc -amdgpu-scalarize-global-loads=false -verify-machineinstrs -march=amdgcn -mcpu=tonga < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

; When a frame index offset is more than 12-bits, make sure we don't store		; When a frame index offset is more than 12-bits, make sure we don't store
; it in mubuf's offset field.		; it in mubuf's offset field.

; Also, make sure we use the same register for storing the scratch buffer addresss		; Also, make sure we use the same register for storing the scratch buffer addresss
; for both stores. This register is allocated by the register scavenger, so we		; for both stores. This register is allocated by the register scavenger, so we
; should be able to reuse the same regiser for each scratch buffer access.		; should be able to reuse the same regiser for each scratch buffer access.

; GCN-LABEL: {{^}}legal_offset_fi:		; GCN-LABEL: {{^}}legal_offset_fi:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+}}:{{[0-9]+}}], 0 offset:4{{$}}
; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x8004		; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x8004
; GCN: buffer_store_dword v{{[0-9]+}}, [[OFFSET]], s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[OFFSET]], s[{{[0-9]+}}:{{[0-9]+}}], 0 offen{{$}}

define amdgpu_kernel void @legal_offset_fi(i32 addrspace(1)* %out, i32 %cond, i32 %if_offset, i32 %else_offset) {		define amdgpu_kernel void @legal_offset_fi(i32 addrspace(1)* %out, i32 %cond, i32 %if_offset, i32 %else_offset) {
entry:		entry:
%scratch0 = alloca [8192 x i32], addrspace(5)		%scratch0 = alloca [8192 x i32], addrspace(5)
%scratch1 = alloca [8192 x i32], addrspace(5)		%scratch1 = alloca [8192 x i32], addrspace(5)

%scratchptr0 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %scratch0, i32 0, i32 0		%scratchptr0 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %scratch0, i32 0, i32 0
store i32 1, i32 addrspace(5)* %scratchptr0		store i32 1, i32 addrspace(5)* %scratchptr0
Show All 19 Lines	done:
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void

ret void		ret void

}		}

; GCN-LABEL: {{^}}legal_offset_fi_offset:		; GCN-LABEL: {{^}}legal_offset_fi_offset:
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen{{$}}
; This constant isn't folded, because it has multiple uses.		; This constant isn't folded, because it has multiple uses.
; GCN-DAG: v_mov_b32_e32 [[K8000:v[0-9]+]], 0x8004		; GCN-DAG: v_mov_b32_e32 [[K8000:v[0-9]+]], 0x8004
; GCN-DAG: v_add_{{[iu]}}32_e32 [[OFFSET:v[0-9]+]], vcc, [[K8000]]		; GCN-DAG: v_add_{{[iu]}}32_e32 [[OFFSET:v[0-9]+]], vcc, [[K8000]]
; GCN: buffer_store_dword v{{[0-9]+}}, [[OFFSET]], s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[OFFSET]], s[{{[0-9]+}}:{{[0-9]+}}], 0 offen{{$}}

define amdgpu_kernel void @legal_offset_fi_offset(i32 addrspace(1)* %out, i32 %cond, i32 addrspace(1)* %offsets, i32 %if_offset, i32 %else_offset) {		define amdgpu_kernel void @legal_offset_fi_offset(i32 addrspace(1)* %out, i32 %cond, i32 addrspace(1)* %offsets, i32 %if_offset, i32 %else_offset) {
entry:		entry:
%scratch0 = alloca [8192 x i32], addrspace(5)		%scratch0 = alloca [8192 x i32], addrspace(5)
%scratch1 = alloca [8192 x i32], addrspace(5)		%scratch1 = alloca [8192 x i32], addrspace(5)

%offset0 = load i32, i32 addrspace(1)* %offsets		%offset0 = load i32, i32 addrspace(1)* %offsets
%scratchptr0 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %scratch0, i32 0, i32 %offset0		%scratchptr0 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %scratch0, i32 0, i32 %offset0
Show All 20 Lines
done:		done:
%value = phi i32 [%if_value, %if], [%else_value, %else]		%value = phi i32 [%if_value, %if], [%else_value, %else]
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}neg_vaddr_offset_inbounds:		; GCN-LABEL: {{^}}neg_vaddr_offset_inbounds:
; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 16, v{{[0-9]+}}		; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 16, v{{[0-9]+}}
; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[{{[0-9]+:[0-9]+}}], 0 offen{{$}}
define amdgpu_kernel void @neg_vaddr_offset_inbounds(i32 %offset) {		define amdgpu_kernel void @neg_vaddr_offset_inbounds(i32 %offset) {
entry:		entry:
%array = alloca [8192 x i32], addrspace(5)		%array = alloca [8192 x i32], addrspace(5)
%ptr_offset = add i32 %offset, 4		%ptr_offset = add i32 %offset, 4
%ptr = getelementptr inbounds [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %ptr_offset		%ptr = getelementptr inbounds [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %ptr_offset
store i32 0, i32 addrspace(5)* %ptr		store i32 0, i32 addrspace(5)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}neg_vaddr_offset:		; GCN-LABEL: {{^}}neg_vaddr_offset:
; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 16, v{{[0-9]+}}		; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 16, v{{[0-9]+}}
; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[{{[0-9]+:[0-9]+}}], 0 offen{{$}}
define amdgpu_kernel void @neg_vaddr_offset(i32 %offset) {		define amdgpu_kernel void @neg_vaddr_offset(i32 %offset) {
entry:		entry:
%array = alloca [8192 x i32], addrspace(5)		%array = alloca [8192 x i32], addrspace(5)
%ptr_offset = add i32 %offset, 4		%ptr_offset = add i32 %offset, 4
%ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %ptr_offset		%ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %ptr_offset
store i32 0, i32 addrspace(5)* %ptr		store i32 0, i32 addrspace(5)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}pos_vaddr_offset:		; GCN-LABEL: {{^}}pos_vaddr_offset:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:20		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:20
define amdgpu_kernel void @pos_vaddr_offset(i32 addrspace(1)* %out, i32 %offset) {		define amdgpu_kernel void @pos_vaddr_offset(i32 addrspace(1)* %out, i32 %offset) {
entry:		entry:
%array = alloca [8192 x i32], addrspace(5)		%array = alloca [8192 x i32], addrspace(5)
%ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 4		%ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 4
store i32 0, i32 addrspace(5)* %ptr		store i32 0, i32 addrspace(5)* %ptr
%load_ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %offset		%load_ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %offset
%val = load i32, i32 addrspace(5)* %load_ptr		%val = load i32, i32 addrspace(5)* %load_ptr
store i32 %val, i32 addrspace(1)* %out		store i32 %val, i32 addrspace(1)* %out
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/scratch-simple.ll

	Show All 23 Lines
	; GFX10_W64-DAG: s_mov_b32 s7, 0x31e16000			; GFX10_W64-DAG: s_mov_b32 s7, 0x31e16000
	; GCN-DAG: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0			; GCN-DAG: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0
	; GCN-DAG: v_and_b32_e32 [[CLAMP_IDX:v[0-9]+]], 0x1fc, [[BYTES]]			; GCN-DAG: v_and_b32_e32 [[CLAMP_IDX:v[0-9]+]], 0x1fc, [[BYTES]]
	; GCN-NOT: s_mov_b32 s0			; GCN-NOT: s_mov_b32 s0

	; GCN-DAG: v_or_b32_e32 [[LO_OFF:v[0-9]+]], 0x200, [[CLAMP_IDX]]			; GCN-DAG: v_or_b32_e32 [[LO_OFF:v[0-9]+]], 0x200, [[CLAMP_IDX]]
	; GCN-DAG: v_or_b32_e32 [[HI_OFF:v[0-9]+]], 0x400, [[CLAMP_IDX]]			; GCN-DAG: v_or_b32_e32 [[HI_OFF:v[0-9]+]], 0x400, [[CLAMP_IDX]]

	; GCN: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_ps float @ps_main(i32 %idx) {			define amdgpu_ps float @ps_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}vs_main:			; GCN-LABEL: {{^}}vs_main:
	; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN-NOT: s_mov_b32 s0			; GCN-NOT: s_mov_b32 s0
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_vs float @vs_main(i32 %idx) {			define amdgpu_vs float @vs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}cs_main:			; GCN-LABEL: {{^}}cs_main:
	; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_cs float @cs_main(i32 %idx) {			define amdgpu_cs float @cs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}hs_main:			; GCN-LABEL: {{^}}hs_main:
	; SIVI: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; SIVI: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; SIVI-NOT: s_mov_b32 s0			; SIVI-NOT: s_mov_b32 s0
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; GFX9_10-NOT: s_mov_b32 s5			; GFX9_10-NOT: s_mov_b32 s5
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_hs float @hs_main(i32 %idx) {			define amdgpu_hs float @hs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}gs_main:			; GCN-LABEL: {{^}}gs_main:
	; SIVI: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; SIVI: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_gs float @gs_main(i32 %idx) {			define amdgpu_gs float @gs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:			; FIXME: This change assumes the scratch wave offset is dead after being used
				arsenmUnsubmitted Not Done Reply Inline Actions Can you add a comment elaborating on what this tests arsenm: Can you add a comment elaborating on what this tests
				scott.linderAuthorUnsubmitted Done Reply Inline Actions From discussion with @mareko my understanding is that Mesa GS and HS shaders have the preloaded scratch wave offset SGPR fixed at SGPR5, and the inreg implementation is used to reference it in the IR. So here, the shader snippet inserted after the SI_RETURN_TO_EPILOG wants to use the scratch wave offset, and the IR passes it along by padding out the inreg arguments until it gets to where the scratch wave offset is, and then using it in the return value. I'll add something to that effect in the test. scott.linder: From discussion with @mareko my understanding is that Mesa GS and HS shaders have the preloaded…
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; to update the scratch SRD, but this test previously used `inreg` to refer to
				; the scratch wave offset in cases where it has a fixed location (i.e. SGPR5
	; SIVI-NOT: s_mov_b32 s6			; for GFX9). What exactly is the test trying to verify, and is the change to
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; mark the scratch wave offset as "killed" by the new setup in the prologue OK?
				scott.linderAuthorUnsubmitted Done Reply Inline Actions @arsenm @nhaehnle Similar question as above wrt. how `inreg` should work. Is the `%swo` argument in these expected to actually be allowed to coincide with the scratch wave offset? scott.linder: @arsenm @nhaehnle Similar question as above wrt. how `inreg` should work. Is the `%swo`…
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen

	; GFX9_10-NOT: s_mov_b32 s5
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen

	; GCN-DAG: s_mov_b32 s2, s5
	define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%f = fadd float %v1, %v2
	%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
	%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
	ret <{i32, i32, i32, float}> %r2
	}

	; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0

	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen

	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen

	; GCN-DAG: s_mov_b32 s2, s5
	define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%f = fadd float %v1, %v2
	%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
	%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
	ret <{i32, i32, i32, float}> %r2
	}

llvm/test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir

	Show All 29 Lines
	# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# SHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# SHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

	# SHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)			# SHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)
	# SHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# SHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# SHARE: SI_SPILL_S64_SAVE killed renamable $sgpr6_sgpr7, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)			# SHARE: SI_SPILL_S64_SAVE killed renamable $sgpr4_sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)
	# SHARE: renamable $sgpr6_sgpr7 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# SHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit undef $vgpr0			# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0
	# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)
	# SHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# SHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	# SHARE: renamable $sgpr6_sgpr7 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# SHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit $vgpr0			# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0
	# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)

	# NOSHARE: stack:			# NOSHARE: stack:
	# NOSHARE: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: default, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 1, name: '', type: spill-slot, offset: 0, size: 8, alignment: 4,			# NOSHARE: - { id: 1, name: '', type: spill-slot, offset: 0, size: 8, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 3, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 3, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

	# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)			# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)
	# NOSHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# NOSHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# NOSHARE: SI_SPILL_S64_SAVE killed renamable $sgpr6_sgpr7, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)			# NOSHARE: SI_SPILL_S64_SAVE killed renamable $sgpr4_sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)
	# NOSHARE: renamable $sgpr6_sgpr7 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# NOSHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit undef $vgpr0			# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0
	# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)
	# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.3, addrspace 5)			# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.3, addrspace 5)
	# NOSHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# NOSHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	# NOSHARE: renamable $sgpr6_sgpr7 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# NOSHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit $vgpr0			# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0
	# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.3, addrspace 5)			# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.3, addrspace 5)

	...			...

	name: sgpr_spill_wrong_stack_id			name: sgpr_spill_wrong_stack_id
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	hasCalls: true			hasCalls: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	frameOffsetReg: $sgpr32			frameOffsetReg: $sgpr32
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	%0:sreg_32_xm0 = COPY $sgpr32			%0:sreg_32_xm0 = COPY $sgpr32
	%1:vreg_64 = IMPLICIT_DEF			%1:vreg_64 = IMPLICIT_DEF
	%2:vgpr_32 = FLAT_LOAD_DWORD %1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr			%2:vgpr_32 = FLAT_LOAD_DWORD %1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr
	%3:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @func + 4, target-flags(amdgpu-rel32-hi) @func + 4, implicit-def dead $scc			%3:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @func + 4, target-flags(amdgpu-rel32-hi) @func + 4, implicit-def dead $scc
	ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32			ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32
	dead $sgpr30_sgpr31 = SI_CALL %3, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit undef $vgpr0			dead $sgpr30_sgpr31 = SI_CALL %3, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0
	$sgpr32 = COPY %0			$sgpr32 = COPY %0
	%4:sreg_32_xm0 = COPY $sgpr32			%4:sreg_32_xm0 = COPY $sgpr32
	ADJCALLSTACKDOWN 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32			ADJCALLSTACKDOWN 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32
	ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32			ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32
	$vgpr0 = COPY %2			$vgpr0 = COPY %2
	dead $sgpr30_sgpr31 = SI_CALL %3, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit killed $vgpr0			dead $sgpr30_sgpr31 = SI_CALL %3, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit killed $vgpr0
	$sgpr32 = COPY %4			$sgpr32 = COPY %4
	ADJCALLSTACKDOWN 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32			ADJCALLSTACKDOWN 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32

	...			...

llvm/test/CodeGen/AMDGPU/shl_add_ptr.ll

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	define void @shl_add_ptr_combine_2use_both_max_lds_offset(i32 %idx) #0 {
%ptr1 = inttoptr i32 %shl1 to i32 addrspace(3)*		%ptr1 = inttoptr i32 %shl1 to i32 addrspace(3)*
store volatile i32 9, i32 addrspace(3)* %ptr0		store volatile i32 9, i32 addrspace(3)* %ptr0
store volatile i32 10, i32 addrspace(3)* %ptr1		store volatile i32 10, i32 addrspace(3)* %ptr1
ret void		ret void
}		}

; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_private:		; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_private:
; GCN: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 2, v0		; GCN: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 2, v0
; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], s33 offen offset:16		; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], 0 offen offset:16

; GCN: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 3, v0		; GCN: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 3, v0
; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE1]], s[0:3], s33 offen offset:32		; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE1]], s[0:3], 0 offen offset:32
define void @shl_add_ptr_combine_2use_private(i16 zeroext %idx.arg) #0 {		define void @shl_add_ptr_combine_2use_private(i16 zeroext %idx.arg) #0 {
%idx = zext i16 %idx.arg to i32		%idx = zext i16 %idx.arg to i32
%idx.add = add nuw i32 %idx, 4		%idx.add = add nuw i32 %idx, 4
%shl0 = shl i32 %idx.add, 2		%shl0 = shl i32 %idx.add, 2
%shl1 = shl i32 %idx.add, 3		%shl1 = shl i32 %idx.add, 3
%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*		%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*
%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*		%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*
store volatile i32 9, i32 addrspace(5)* %ptr0		store volatile i32 9, i32 addrspace(5)* %ptr0
store volatile i32 10, i32 addrspace(5)* %ptr1		store volatile i32 10, i32 addrspace(5)* %ptr1
ret void		ret void
}		}

; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_max_private_offset:		; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_max_private_offset:
; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 3, v0		; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 3, v0
; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 4, v0		; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 4, v0
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], s33 offen offset:4088		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], 0 offen offset:4088
; GCN-DAG: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 0x1ff0, [[SCALE1]]		; GCN-DAG: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 0x1ff0, [[SCALE1]]
; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[0:3], s33 offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[0:3], 0 offen{{$}}
define void @shl_add_ptr_combine_2use_max_private_offset(i16 zeroext %idx.arg) #0 {		define void @shl_add_ptr_combine_2use_max_private_offset(i16 zeroext %idx.arg) #0 {
%idx = zext i16 %idx.arg to i32		%idx = zext i16 %idx.arg to i32
%idx.add = add nuw i32 %idx, 511		%idx.add = add nuw i32 %idx, 511
%shl0 = shl i32 %idx.add, 3		%shl0 = shl i32 %idx.add, 3
%shl1 = shl i32 %idx.add, 4		%shl1 = shl i32 %idx.add, 4
%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*		%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*
%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*		%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*
store volatile i32 9, i32 addrspace(5)* %ptr0		store volatile i32 9, i32 addrspace(5)* %ptr0
store volatile i32 10, i32 addrspace(5)* %ptr1		store volatile i32 10, i32 addrspace(5)* %ptr1
ret void		ret void
}		}
; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_both_max_private_offset:		; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_both_max_private_offset:
; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 0x100, v0		; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 0x100, v0
; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 4, [[ADD]]		; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 4, [[ADD]]
; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 5, [[ADD]]		; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 5, [[ADD]]
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], s33 offen{{$}}		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], 0 offen{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE1]], s[0:3], s33 offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE1]], s[0:3], 0 offen{{$}}
define void @shl_add_ptr_combine_2use_both_max_private_offset(i16 zeroext %idx.arg) #0 {		define void @shl_add_ptr_combine_2use_both_max_private_offset(i16 zeroext %idx.arg) #0 {
%idx = zext i16 %idx.arg to i32		%idx = zext i16 %idx.arg to i32
%idx.add = add nuw i32 %idx, 256		%idx.add = add nuw i32 %idx, 256
%shl0 = shl i32 %idx.add, 4		%shl0 = shl i32 %idx.add, 4
%shl1 = shl i32 %idx.add, 5		%shl1 = shl i32 %idx.add, 5
%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*		%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*
%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*		%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*
store volatile i32 9, i32 addrspace(5)* %ptr0		store volatile i32 9, i32 addrspace(5)* %ptr0
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; ALL-LABEL: {{^}}test:			; ALL-LABEL: {{^}}test:
	; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0			; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0
	; ALL: s_mov_b32 s[[OFF:[0-9]+]], s3
	; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000			; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000

	; Make sure we are handling hazards correctly.			; Make sure we are handling hazards correctly.
	; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:16			; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:16
	; SGPR-NEXT: s_waitcnt vmcnt(0)			; SGPR-NEXT: s_waitcnt vmcnt(0)
	; SGPR-NEXT: v_readfirstlane_b32 s[[HI:[0-9]+]], [[VHI]]			; SGPR-NEXT: v_readfirstlane_b32 s[[HI:[0-9]+]], [[VHI]]
	; SGPR-NEXT: s_nop 4			; SGPR-NEXT: s_nop 4
	; SGPR-NEXT: buffer_store_dword v0, off, s[0:[[HI]]{{\]}}, 0			; SGPR-NEXT: buffer_store_dword v0, off, s[0:[[HI]]{{\]}}, 0

	; ALL: s_endpgm			; ALL: s_endpgm
	define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {
	call void asm sideeffect "", "~{s[0:7]}" ()			call void asm sideeffect "", "~{s[0:7]}" ()
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines

	; Tail call disallowed with byval in parent, not callee. The stack			; Tail call disallowed with byval in parent, not callee. The stack
	; usage of incoming arguments must be <= the outgoing stack			; usage of incoming arguments must be <= the outgoing stack
	; arguments.			; arguments.

	; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_byval_i32:			; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_byval_i32:
	; GCN-NOT: v0			; GCN-NOT: v0
	; GCN-NOT: s32			; GCN-NOT: s32
	; GCN: buffer_load_dword v1, off, s[0:3], s33 offset:16			; GCN: buffer_load_dword v1, off, s[0:3], 0 offset:16
	; GCN: buffer_store_dword v1, off, s[0:3], s32{{$}}			; GCN: buffer_store_dword v1, off, s[0:3], s32{{$}}
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define fastcc i32 @sibling_call_i32_fastcc_i32_byval_i32(i32 %a, [32 x i32] %large) #1 {			define fastcc i32 @sibling_call_i32_fastcc_i32_byval_i32(i32 %a, [32 x i32] %large) #1 {
	entry:			entry:
	%ret = tail call fastcc i32 @i32_fastcc_i32_byval_i32(i32 %a, i32 addrspace(5)* inttoptr (i32 16 to i32 addrspace(5)*))			%ret = tail call fastcc i32 @i32_fastcc_i32_byval_i32(i32 %a, i32 addrspace(5)* inttoptr (i32 16 to i32 addrspace(5)*))
	ret i32 %ret			ret i32 %ret
	}			}

	▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sp-too-many-input-sgprs.ll

This file was deleted.

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -verify-machineinstrs < %s \| FileCheck -check-prefixes=MESA3D,ALL %s
	; RUN: llc -mtriple=amdgcn-- -verify-machineinstrs < %s \| FileCheck -check-prefixes=UNKNOWN,ALL %s

	; Make sure shaders pick a workable SP with > 32 input SGPRs.
	; FIXME: Doesn't seem to be getting initial value from right register?

	; ALL-LABEL: {{^}}too_many_input_sgprs_32:
	; MESA3D-NOT: s34
	; MESA3D: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s34 offset:4

	; Happens to end up in s32 anyway
	; UNKNOWN-NOT: s32
	; UNKNOWN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:4
	define amdgpu_ps i32 @too_many_input_sgprs_32(i32 inreg %arg, i32 inreg %arg1, i32 inreg %arg2, i32 inreg %arg3, i32 inreg %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 inreg %arg7,
	i32 inreg %arg8, i32 inreg %arg9, i32 inreg %arg10, i32 inreg %arg11, i32 inreg %arg12, i32 inreg %arg13, i32 inreg %arg14, i32 inreg %arg15,
	i32 inreg %arg16, i32 inreg %arg17, i32 inreg %arg18, i32 inreg %arg19, i32 inreg %arg20, i32 inreg %arg21, i32 inreg %arg22, i32 inreg %arg23,
	i32 inreg %arg24, i32 inreg %arg25, i32 inreg %arg26, i32 inreg %arg27, i32 inreg %arg28, i32 inreg %arg29, i32 inreg %arg30, i32 inreg %arg31) {
	bb:
	%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca
	%tmp = add i32 %arg, %arg1
	%tmp32 = add i32 %tmp, %arg2
	%tmp33 = add i32 %tmp32, %arg3
	%tmp34 = add i32 %tmp33, %arg4
	%tmp35 = add i32 %tmp34, %arg5
	%tmp36 = add i32 %tmp35, %arg6
	%tmp37 = add i32 %tmp36, %arg7
	%tmp38 = add i32 %tmp37, %arg8
	%tmp39 = add i32 %tmp38, %arg9
	%tmp40 = add i32 %tmp39, %arg10
	%tmp41 = add i32 %tmp40, %arg11
	%tmp42 = add i32 %tmp41, %arg12
	%tmp43 = add i32 %tmp42, %arg13
	%tmp44 = add i32 %tmp43, %arg14
	%tmp45 = add i32 %tmp44, %arg15
	%tmp46 = add i32 %tmp45, %arg16
	%tmp47 = add i32 %tmp46, %arg17
	%tmp48 = add i32 %tmp47, %arg18
	%tmp49 = add i32 %tmp48, %arg19
	%tmp50 = add i32 %tmp49, %arg20
	%tmp51 = add i32 %tmp50, %arg21
	%tmp52 = add i32 %tmp51, %arg22
	%tmp53 = add i32 %tmp52, %arg23
	%tmp54 = add i32 %tmp53, %arg24
	%tmp55 = add i32 %tmp54, %arg25
	%tmp56 = add i32 %tmp55, %arg26
	%tmp57 = add i32 %tmp56, %arg27
	%tmp58 = add i32 %tmp57, %arg28
	%tmp59 = add i32 %tmp58, %arg29
	%tmp60 = add i32 %tmp59, %arg30
	%tmp61 = add i32 %tmp60, %arg31
	ret i32 %tmp61
	}

	; ALL-LABEL: {{^}}too_many_input_sgprs_33:
	; MESA3D-NOT: s35
	; MESA3D: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s35 offset:4

	; UNKNOWN-NOT: s33
	; UNKNOWN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s33 offset:4
	define amdgpu_ps i32 @too_many_input_sgprs_33(i32 inreg %arg, i32 inreg %arg1, i32 inreg %arg2, i32 inreg %arg3, i32 inreg %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 inreg %arg7,
	i32 inreg %arg8, i32 inreg %arg9, i32 inreg %arg10, i32 inreg %arg11, i32 inreg %arg12, i32 inreg %arg13, i32 inreg %arg14, i32 inreg %arg15,
	i32 inreg %arg16, i32 inreg %arg17, i32 inreg %arg18, i32 inreg %arg19, i32 inreg %arg20, i32 inreg %arg21, i32 inreg %arg22, i32 inreg %arg23,
	i32 inreg %arg24, i32 inreg %arg25, i32 inreg %arg26, i32 inreg %arg27, i32 inreg %arg28, i32 inreg %arg29, i32 inreg %arg30, i32 inreg %arg31,
	i32 inreg %arg32) {
	bb:
	%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca
	%tmp = add i32 %arg, %arg1
	%tmp32 = add i32 %tmp, %arg2
	%tmp33 = add i32 %tmp32, %arg3
	%tmp34 = add i32 %tmp33, %arg4
	%tmp35 = add i32 %tmp34, %arg5
	%tmp36 = add i32 %tmp35, %arg6
	%tmp37 = add i32 %tmp36, %arg7
	%tmp38 = add i32 %tmp37, %arg8
	%tmp39 = add i32 %tmp38, %arg9
	%tmp40 = add i32 %tmp39, %arg10
	%tmp41 = add i32 %tmp40, %arg11
	%tmp42 = add i32 %tmp41, %arg12
	%tmp43 = add i32 %tmp42, %arg13
	%tmp44 = add i32 %tmp43, %arg14
	%tmp45 = add i32 %tmp44, %arg15
	%tmp46 = add i32 %tmp45, %arg16
	%tmp47 = add i32 %tmp46, %arg17
	%tmp48 = add i32 %tmp47, %arg18
	%tmp49 = add i32 %tmp48, %arg19
	%tmp50 = add i32 %tmp49, %arg20
	%tmp51 = add i32 %tmp50, %arg21
	%tmp52 = add i32 %tmp51, %arg22
	%tmp53 = add i32 %tmp52, %arg23
	%tmp54 = add i32 %tmp53, %arg24
	%tmp55 = add i32 %tmp54, %arg25
	%tmp56 = add i32 %tmp55, %arg26
	%tmp57 = add i32 %tmp56, %arg27
	%tmp58 = add i32 %tmp57, %arg28
	%tmp59 = add i32 %tmp58, %arg29
	%tmp60 = add i32 %tmp59, %arg30
	%tmp61 = add i32 %tmp60, %arg31
	%tmp62 = add i32 %tmp61, %arg32
	ret i32 %tmp62
	}

llvm/test/CodeGen/AMDGPU/spill-agpr.ll

; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX908,A2V %s		; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX908,A2V %s
; RUN: llc -march=amdgcn -mcpu=gfx908 -amdgpu-spill-vgpr-to-agpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX908,A2M %s		; RUN: llc -march=amdgcn -mcpu=gfx908 -amdgpu-spill-vgpr-to-agpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX908,A2M %s

; GCN-LABEL: {{^}}max_24regs_32a_used:		; GCN-LABEL: {{^}}max_24regs_32a_used:
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; A2V-NOT: SCRATCH_RSRC		; A2V-NOT: SCRATCH_RSRC
; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0		; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0
; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI:[0-9]+]] ; 4-byte Folded Spill		; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI]] ; 4-byte Folded Reload		; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload
; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]		; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
; A2V: ScratchSize: 0		; A2V: ScratchSize: 0
define amdgpu_kernel void @max_24regs_32a_used(<16 x float> addrspace(1)* %arg, float addrspace(1)* %out) #0 {		define amdgpu_kernel void @max_24regs_32a_used(<16 x float> addrspace(1)* %arg, float addrspace(1)* %out) #0 {
bb:		bb:
%in.1 = load <16 x float>, <16 x float> addrspace(1)* %arg		%in.1 = load <16 x float>, <16 x float> addrspace(1)* %arg
%mai.1 = tail call <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float 1.0, float 1.0, <16 x float> %in.1, i32 0, i32 0, i32 0)		%mai.1 = tail call <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float 1.0, float 1.0, <16 x float> %in.1, i32 0, i32 0, i32 0)
%mai.2 = tail call <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float 1.0, float 1.0, <16 x float> %mai.1, i32 0, i32 0, i32 0)		%mai.2 = tail call <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float 1.0, float 1.0, <16 x float> %mai.1, i32 0, i32 0, i32 0)
%elt1 = extractelement <16 x float> %mai.2, i32 0		%elt1 = extractelement <16 x float> %mai.2, i32 0
Show All 11 Lines	bb:
ret void		ret void
}		}

; GCN-LABEL: {{^}}max_12regs_13a_used:		; GCN-LABEL: {{^}}max_12regs_13a_used:
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; A2V-NOT: SCRATCH_RSRC		; A2V-NOT: SCRATCH_RSRC
; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a4		; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a4
; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI:[0-9]+]] ; 4-byte Folded Spill		; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI]] ; 4-byte Folded Reload		; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload
; A2V: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]		; A2V: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
; A2V: ScratchSize: 0		; A2V: ScratchSize: 0
define amdgpu_kernel void @max_12regs_13a_used(<4 x float> addrspace(1)* %arg, <4 x float> addrspace(1)* %out) #2 {		define amdgpu_kernel void @max_12regs_13a_used(<4 x float> addrspace(1)* %arg, <4 x float> addrspace(1)* %out) #2 {
bb:		bb:
%in.1 = load <4 x float>, <4 x float> addrspace(1)* %arg		%in.1 = load <4 x float>, <4 x float> addrspace(1)* %arg
%mai.1 = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %in.1, i32 0, i32 0, i32 0)		%mai.1 = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %in.1, i32 0, i32 0, i32 0)
%mai.2 = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %mai.1, i32 0, i32 0, i32 0)		%mai.2 = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %mai.1, i32 0, i32 0, i32 0)
br label %use		br label %use
Show All 11 Lines	st:
ret void		ret void
}		}

; GCN-LABEL: {{^}}max_10_vgprs_used_9a:		; GCN-LABEL: {{^}}max_10_vgprs_used_9a:
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; A2V-NOT: SCRATCH_RSRC		; A2V-NOT: SCRATCH_RSRC
; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0		; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0
; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI:[0-9]+]] ; 4-byte Folded Spill		; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI]] ; 4-byte Folded Reload		; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload
; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]		; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
; A2V: ScratchSize: 0		; A2V: ScratchSize: 0
define amdgpu_kernel void @max_10_vgprs_used_9a(i32 addrspace(1)* %p) #1 {		define amdgpu_kernel void @max_10_vgprs_used_9a(i32 addrspace(1)* %p) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
call void asm sideeffect "", "a,a,a,a"(i32 1, i32 2, i32 3, i32 4)		call void asm sideeffect "", "a,a,a,a"(i32 1, i32 2, i32 3, i32 4)
call void asm sideeffect "", "a,a,a,a,a"(i32 5, i32 6, i32 7, i32 8, i32 9)		call void asm sideeffect "", "a,a,a,a,a"(i32 5, i32 6, i32 7, i32 8, i32 9)
ret void		ret void
}		}

; GCN-LABEL: {{^}}max_32regs_mfma32:		; GCN-LABEL: {{^}}max_32regs_mfma32:
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; A2V-NOT: SCRATCH_RSRC		; A2V-NOT: SCRATCH_RSRC
; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0		; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0
; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI:[0-9]+]] ; 4-byte Folded Spill		; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI]] ; 4-byte Folded Reload		; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload
; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]		; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
; A2V: ScratchSize: 0		; A2V: ScratchSize: 0
define amdgpu_kernel void @max_32regs_mfma32(float addrspace(1)* %arg) #3 {		define amdgpu_kernel void @max_32regs_mfma32(float addrspace(1)* %arg) #3 {
bb:		bb:
%v = call i32 asm sideeffect "", "=a"()		%v = call i32 asm sideeffect "", "=a"()
br label %use		br label %use

use:		use:
Show All 16 Lines

llvm/test/CodeGen/AMDGPU/spill-before-exec.mir

# REQUIRES: asserts		# REQUIRES: asserts
# RUN: llc -mtriple=amdgcn--- -verify-machineinstrs -debug-only=regalloc -run-pass=greedy -o /dev/null %s 2>&1 \| FileCheck %s		# RUN: llc -mtriple=amdgcn--- -verify-machineinstrs -debug-only=regalloc -run-pass=greedy -o /dev/null %s 2>&1 \| FileCheck %s

---		---
# Check that physreg candidate is not used since cannot be spilled in a block,		# Check that physreg candidate is not used since cannot be spilled in a block,
# e.g. before exec mask preamble		# e.g. before exec mask preamble
# CHECK: , cannot spill all interferences.		# CHECK: , cannot spill all interferences.

name: foo		name: foo
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3		scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32
registers:		registers:
- { id: 0, class: sreg_64 }		- { id: 0, class: sreg_64 }
- { id: 1100, class: sgpr_128 }		- { id: 1100, class: sgpr_128 }
- { id: 1101, class: sgpr_128 }		- { id: 1101, class: sgpr_128 }
- { id: 1102, class: sgpr_128 }		- { id: 1102, class: sgpr_128 }
- { id: 1103, class: sgpr_128 }		- { id: 1103, class: sgpr_128 }
- { id: 1104, class: sgpr_128 }		- { id: 1104, class: sgpr_128 }
Show All 9 Lines	registers:
- { id: 1114, class: sgpr_128 }		- { id: 1114, class: sgpr_128 }
- { id: 1115, class: sgpr_128 }		- { id: 1115, class: sgpr_128 }
- { id: 1116, class: sgpr_128 }		- { id: 1116, class: sgpr_128 }
- { id: 1117, class: sgpr_128 }		- { id: 1117, class: sgpr_128 }
- { id: 1118, class: sgpr_128 }		- { id: 1118, class: sgpr_128 }
- { id: 1119, class: sgpr_128 }		- { id: 1119, class: sgpr_128 }
- { id: 1120, class: sgpr_128 }		- { id: 1120, class: sgpr_128 }
- { id: 1121, class: sgpr_128 }		- { id: 1121, class: sgpr_128 }
		- { id: 1122, class: sgpr_128 }
		- { id: 1123, class: sgpr_128 }
		- { id: 1124, class: sgpr_128 }
		- { id: 1125, class: sgpr_128 }
body: \|		body: \|
bb.0:		bb.0:
successors: %bb.1		successors: %bb.1
liveins: $sgpr96_sgpr97, $sgpr98_sgpr99, $sgpr100_sgpr101, $sgpr102_sgpr103		liveins: $sgpr96_sgpr97, $sgpr98_sgpr99, $sgpr100_sgpr101, $sgpr102_sgpr103
%0:sreg_64 = COPY $sgpr102_sgpr103		%0:sreg_64 = COPY $sgpr102_sgpr103
%1100 = COPY $sgpr100_sgpr101_sgpr102_sgpr103		%1100 = COPY $sgpr100_sgpr101_sgpr102_sgpr103
%1101 = COPY %1100		%1101 = COPY %1100
%1102 = COPY %1100		%1102 = COPY %1100
Show All 11 Lines	bb.0:
%1114 = COPY %1100		%1114 = COPY %1100
%1115 = COPY %1100		%1115 = COPY %1100
%1116 = COPY %1100		%1116 = COPY %1100
%1117 = COPY %1100		%1117 = COPY %1100
%1118 = COPY %1100		%1118 = COPY %1100
%1119 = COPY %1100		%1119 = COPY %1100
%1120 = COPY %1100		%1120 = COPY %1100
%1121 = COPY %1100		%1121 = COPY %1100
		%1122 = COPY %1100
		%1123 = COPY %1100
		%1124 = COPY %1100
		%1125 = COPY %1100
S_BRANCH %bb.1		S_BRANCH %bb.1

bb.1:		bb.1:
liveins: $sgpr96_sgpr97, $sgpr98_sgpr99, $sgpr102_sgpr103		liveins: $sgpr96_sgpr97, $sgpr98_sgpr99, $sgpr102_sgpr103
%0 = S_OR_SAVEEXEC_B64 $sgpr96_sgpr97, implicit-def $exec, implicit-def $scc, implicit $exec		%0 = S_OR_SAVEEXEC_B64 $sgpr96_sgpr97, implicit-def $exec, implicit-def $scc, implicit $exec
$exec = S_XOR_B64_term $exec, %0, implicit-def $scc		$exec = S_XOR_B64_term $exec, %0, implicit-def $scc
SI_MASK_BRANCH %bb.100, implicit $exec		SI_MASK_BRANCH %bb.100, implicit $exec
S_BRANCH %bb.2		S_BRANCH %bb.2
Show All 18 Lines	bb.200:
S_CMP_EQ_U64 %1106.sub0_sub1, %1107.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1106.sub0_sub1, %1107.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1108.sub0_sub1, %1109.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1108.sub0_sub1, %1109.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1110.sub0_sub1, %1111.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1110.sub0_sub1, %1111.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1112.sub0_sub1, %1113.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1112.sub0_sub1, %1113.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1114.sub0_sub1, %1115.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1114.sub0_sub1, %1115.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1116.sub0_sub1, %1117.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1116.sub0_sub1, %1117.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1118.sub0_sub1, %1119.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1118.sub0_sub1, %1119.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1120.sub0_sub1, %1121.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1120.sub0_sub1, %1121.sub2_sub3, implicit-def $scc
		S_CMP_EQ_U64 %1122.sub0_sub1, %1123.sub2_sub3, implicit-def $scc
		S_CMP_EQ_U64 %1124.sub0_sub1, %1125.sub2_sub3, implicit-def $scc

$vgpr0 = V_MOV_B32_e32 0, implicit $exec		$vgpr0 = V_MOV_B32_e32 0, implicit $exec
S_SETPC_B64_return undef $sgpr30_sgpr31, implicit %0, implicit $vgpr0		S_SETPC_B64_return undef $sgpr30_sgpr31, implicit %0, implicit $vgpr0

...		...

llvm/test/CodeGen/AMDGPU/spill-empty-live-interval.mir

	Show All 15 Lines
	# CHECK-NEXT: %8:vreg_64 = SI_SPILL_V64_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 8 from %stack.0, align 4, addrspace 5)			# CHECK-NEXT: %8:vreg_64 = SI_SPILL_V64_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 8 from %stack.0, align 4, addrspace 5)
	# CHECK-NEXT: S_NOP 0, implicit %8.sub1			# CHECK-NEXT: S_NOP 0, implicit %8.sub1
	# CHECK-NEXT: S_NOP 0, implicit undef %9.sub0			# CHECK-NEXT: S_NOP 0, implicit undef %9.sub0

	name: expecting_non_empty_interval			name: expecting_non_empty_interval
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1

	undef %0.sub1:vreg_64 = V_MAC_F32_e32 0, undef %1:vgpr_32, undef %0.sub1, implicit $exec			undef %0.sub1:vreg_64 = V_MAC_F32_e32 0, undef %1:vgpr_32, undef %0.sub1, implicit $exec
	undef %2.sub1:vreg_64 = V_MOV_B32_e32 1786773504, implicit $exec			undef %2.sub1:vreg_64 = V_MOV_B32_e32 1786773504, implicit $exec
	dead %3:vgpr_32 = V_MUL_F32_e32 0, %2.sub1, implicit $exec			dead %3:vgpr_32 = V_MUL_F32_e32 0, %2.sub1, implicit $exec
	Show All 17 Lines
	# CHECK-NEXT: S_NOP 0, implicit %1.sub2			# CHECK-NEXT: S_NOP 0, implicit %1.sub2
	# CHECK-NEXT: S_NOP 0, implicit undef %4.sub0			# CHECK-NEXT: S_NOP 0, implicit undef %4.sub0
	# CHECK-NEXT: undef %2.sub2:vreg_128 = V_MOV_B32_e32 0, implicit $exec			# CHECK-NEXT: undef %2.sub2:vreg_128 = V_MOV_B32_e32 0, implicit $exec
	# CHECK-NEXT: S_NOP 0, implicit %2.sub2			# CHECK-NEXT: S_NOP 0, implicit %2.sub2
	name: rematerialize_empty_interval_has_reference			name: rematerialize_empty_interval_has_reference
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1

	undef %0.sub2:vreg_128 = V_MOV_B32_e32 0, implicit $exec			undef %0.sub2:vreg_128 = V_MOV_B32_e32 0, implicit $exec
	undef %1.sub2:vreg_128 = V_MOV_B32_e32 1786773504, implicit $exec			undef %1.sub2:vreg_128 = V_MOV_B32_e32 1786773504, implicit $exec

	bb.1:			bb.1:
	S_NOP 0, implicit %1.sub2			S_NOP 0, implicit %1.sub2
	S_NOP 0, implicit undef %0.sub0			S_NOP 0, implicit undef %0.sub0
	S_NOP 0, implicit %0.sub2			S_NOP 0, implicit %0.sub2

	...			...

llvm/test/CodeGen/AMDGPU/spill-m0.ll

	; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s			; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s
	; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s			; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s
	; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s			; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s
	; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s			; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s

	; XXX - Why does it like to use vcc?			; XXX - Why does it like to use vcc?

	; GCN-LABEL: {{^}}spill_m0:			; GCN-LABEL: {{^}}spill_m0:

	; GCN-DAG: s_cmp_lg_u32			; GCN-DAG: s_cmp_lg_u32

	; TOVGPR-DAG: s_mov_b32 [[M0_COPY:s[0-9]+]], m0			; TOVGPR-DAG: s_mov_b32 [[M0_COPY:s[0-9]+]], m0
	; TOVGPR: v_writelane_b32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]], 2			; TOVGPR: v_writelane_b32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]], 2

	; TOVMEM-DAG: s_mov_b32 [[M0_COPY:s[0-9]+]], m0			; TOVMEM-DAG: s_mov_b32 [[M0_COPY:s[0-9]+]], m0
	; TOVMEM-DAG: v_mov_b32_e32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]]			; TOVMEM-DAG: v_mov_b32_e32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]]
	; TOVMEM: buffer_store_dword [[SPILL_VREG]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12 ; 4-byte Folded Spill			; TOVMEM: buffer_store_dword [[SPILL_VREG]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12 ; 4-byte Folded Spill

	; GCN: s_cbranch_scc1 [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN: s_cbranch_scc1 [[ENDIF:BB[0-9]+_[0-9]+]]

	; GCN: [[ENDIF]]:			; GCN: [[ENDIF]]:
	; TOVGPR: v_readlane_b32 [[M0_RESTORE:s[0-9]+]], [[SPILL_VREG]], 2			; TOVGPR: v_readlane_b32 [[M0_RESTORE:s[0-9]+]], [[SPILL_VREG]], 2
	; TOVGPR: s_mov_b32 m0, [[M0_RESTORE]]			; TOVGPR: s_mov_b32 m0, [[M0_RESTORE]]

	; TOVMEM: buffer_load_dword [[RELOAD_VREG:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12 ; 4-byte Folded Reload			; TOVMEM: buffer_load_dword [[RELOAD_VREG:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12 ; 4-byte Folded Reload
	; TOVMEM: s_waitcnt vmcnt(0)			; TOVMEM: s_waitcnt vmcnt(0)
	; TOVMEM: v_readfirstlane_b32 [[M0_RESTORE:s[0-9]+]], [[RELOAD_VREG]]			; TOVMEM: v_readfirstlane_b32 [[M0_RESTORE:s[0-9]+]], [[RELOAD_VREG]]
	; TOVMEM: s_mov_b32 m0, [[M0_RESTORE]]			; TOVMEM: s_mov_b32 m0, [[M0_RESTORE]]

	; GCN: s_add_i32 s{{[0-9]+}}, m0, 1			; GCN: s_add_i32 s{{[0-9]+}}, m0, 1
	define amdgpu_kernel void @spill_m0(i32 %cond, i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @spill_m0(i32 %cond, i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%m0 = call i32 asm sideeffect "s_mov_b32 m0, 0", "={m0}"() #0			%m0 = call i32 asm sideeffect "s_mov_b32 m0, 0", "={m0}"() #0
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -stress-regalloc=8 < %s \| FileCheck %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -stress-regalloc=8 < %s \| FileCheck %s

; Test that the VGPR spiller correctly switches to SGPR offsets when the		; Test that the VGPR spiller correctly switches to SGPR offsets when the
; instruction offset field would overflow, and that it accounts for memory		; instruction offset field would overflow, and that it accounts for memory
; swizzling.		; swizzling.

; CHECK-LABEL: test_inst_offset_kernel		; CHECK-LABEL: test_inst_offset_kernel
define amdgpu_kernel void @test_inst_offset_kernel() {		define amdgpu_kernel void @test_inst_offset_kernel() {
entry:		entry:
; Occupy 4092 bytes of scratch, so the offset of the spill of %a just fits in		; Occupy 4092 bytes of scratch, so the offset of the spill of %a just fits in
; the instruction offset field.		; the instruction offset field.
%alloca = alloca i8, i32 4088, align 4, addrspace(5)		%alloca = alloca i8, i32 4088, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4092 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4092 ; 4-byte Folded Spill
%a = load volatile i32, i32 addrspace(5)* %aptr		%a = load volatile i32, i32 addrspace(5)* %aptr

; Force %a to spill.		; Force %a to spill.
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
store volatile i32 %a, i32 addrspace(5)* %outptr		store volatile i32 %a, i32 addrspace(5)* %outptr

ret void		ret void
}		}

; CHECK-LABEL: test_sgpr_offset_kernel		; CHECK-LABEL: test_sgpr_offset_kernel
define amdgpu_kernel void @test_sgpr_offset_kernel() {		define amdgpu_kernel void @test_sgpr_offset_kernel() {
entry:		entry:
; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not		; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
; fit in the instruction, and has to live in the SGPR offset.		; fit in the instruction, and has to live in the SGPR offset.
%alloca = alloca i8, i32 4092, align 4, addrspace(5)		%alloca = alloca i8, i32 4092, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
; 0x40000 / 64 = 4096 (for wave64)		; 0x40000 / 64 = 4096 (for wave64)
; CHECK: s_add_u32 s6, s7, 0x40000		; CHECK: s_mov_b32 s6, 0x40000
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill
%a = load volatile i32, i32 addrspace(5)* %aptr		%a = load volatile i32, i32 addrspace(5)* %aptr

; Force %a to spill		; Force %a to spill
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
store volatile i32 %a, i32 addrspace(5)* %outptr		store volatile i32 %a, i32 addrspace(5)* %outptr

ret void		ret void
}		}

; CHECK-LABEL: test_sgpr_offset_kernel_scavenge_fail		; FIXME: If we fail to scavenge an SGPR in a kernel we don't have a stack
define amdgpu_kernel void @test_sgpr_offset_kernel_scavenge_fail() #1 {		; pointer to temporarily update, so we just crash.
		scott.linderAuthorUnsubmitted Done Reply Inline Actions Is it OK for us to fail here? This is a consequence of not having a frame pointer in entry functions and not being able to e.g. restart RA after we realize we really need it in this case. scott.linder: Is it OK for us to fail here? This is a consequence of not having a frame pointer in entry…
entry:
; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
; fit in the instruction, and has to live in the SGPR offset.
%alloca = alloca i8, i32 4092, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1

; 0x40000 / 64 = 4096 (for wave64)
%a = load volatile i32, i32 addrspace(5)* %aptr

%asm = call { i32, i32, i32, i32, i32, i32, i32, i32 } asm sideeffect "", "=s,=s,=s,=s,=s,=s,=s,=s"()
%asm0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 0
%asm1 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 1
%asm2 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 2
%asm3 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 3
%asm4 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 4
%asm5 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 5
%asm6 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 6
%asm7 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 7

call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0

; CHECK: s_add_u32 s7, s7, 0x40000
; CHECK: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s7 ; 4-byte Folded Reload
; CHECK: s_sub_u32 s7, s7, 0x40000

; Force %a to spill with no free SGPRs
call void asm sideeffect "", "s,s,s,s,s,s,s,s,v"(i32 %asm0, i32 %asm1, i32 %asm2, i32 %asm3, i32 %asm4, i32 %asm5, i32 %asm6, i32 %asm7, i32 %a)
ret void
}

; CHECK-LABEL: test_sgpr_offset_function_scavenge_fail		; CHECK-LABEL: test_sgpr_offset_function_scavenge_fail
define void @test_sgpr_offset_function_scavenge_fail() #2 {		define void @test_sgpr_offset_function_scavenge_fail() #2 {
entry:		entry:
; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not		; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
; fit in the instruction, and has to live in the SGPR offset.		; fit in the instruction, and has to live in the SGPR offset.
%alloca = alloca i8, i32 4096, align 4, addrspace(5)		%alloca = alloca i8, i32 4096, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
entry:		entry:
; Occupy 4088 bytes of scratch, so that the spill of the last subreg of %a		; Occupy 4088 bytes of scratch, so that the spill of the last subreg of %a
; still fits below offset 4096 (4088 + 8 - 4 = 4092), and can be placed in		; still fits below offset 4096 (4088 + 8 - 4 = 4092), and can be placed in
; the instruction offset field.		; the instruction offset field.
%alloca = alloca i8, i32 4084, align 4, addrspace(5)		%alloca = alloca i8, i32 4084, align 4, addrspace(5)
%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*
%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*		%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*

; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4088 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4088 ; 4-byte Folded Spill
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4092 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4092 ; 4-byte Folded Spill
%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1		%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1
%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr		%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr

; Force %a to spill.		; Force %a to spill.
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

; Ensure the alloca sticks around.		; Ensure the alloca sticks around.
%bptr = getelementptr i32, i32 addrspace(5)* %bufv1, i32 1		%bptr = getelementptr i32, i32 addrspace(5)* %bufv1, i32 1
Show All 11 Lines	entry:
; Occupy 4092 bytes of scratch, so that the spill of the last subreg of %a		; Occupy 4092 bytes of scratch, so that the spill of the last subreg of %a
; does not fit below offset 4096 (4092 + 8 - 4 = 4096), and has to live		; does not fit below offset 4096 (4092 + 8 - 4 = 4096), and has to live
; in the SGPR offset.		; in the SGPR offset.
%alloca = alloca i8, i32 4088, align 4, addrspace(5)		%alloca = alloca i8, i32 4088, align 4, addrspace(5)
%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*
%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*		%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*

; 0x3ff00 / 64 = 4092 (for wave64)		; 0x3ff00 / 64 = 4092 (for wave64)
; CHECK: s_add_u32 s6, s7, 0x3ff00		; CHECK: s_mov_b32 s6, 0x3ff00
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 offset:4 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 offset:4 ; 4-byte Folded Spill
%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1		%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1
%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr		%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr

; Force %a to spill.		; Force %a to spill.
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefix=GCN %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefix=GCN %s

	; An assert was hit when frame offset register was used to address FrameIndex.			; An assert was hit when frame offset register was used to address FrameIndex.
	define amdgpu_kernel void @kernel_background_evaluate(float addrspace(5)* %kg, <4 x i32> addrspace(1)* %input, <4 x float> addrspace(1)* %output, i32 %i) {			define amdgpu_kernel void @kernel_background_evaluate(float addrspace(5)* %kg, <4 x i32> addrspace(1)* %input, <4 x float> addrspace(1)* %output, i32 %i) {
	; GCN-LABEL: kernel_background_evaluate:			; GCN-LABEL: kernel_background_evaluate:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_load_dword s6, s[0:1], 0x24			; GCN-NEXT: s_load_dword s6, s[0:1], 0x24
	; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0			; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0
	; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1			; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1
	; GCN-NEXT: s_mov_b32 s38, -1			; GCN-NEXT: s_mov_b32 s38, -1
	; GCN-NEXT: s_mov_b32 s39, 0x31c16000			; GCN-NEXT: s_mov_b32 s39, 0x31c16000
	; GCN-NEXT: s_mov_b32 s33, s3			; GCN-NEXT: s_add_u32 s36, s36, s3
	; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]			; GCN-NEXT: s_addc_u32 s37, s37, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0x2000			; GCN-NEXT: v_mov_b32_e32 v1, 0x2000
	; GCN-NEXT: v_mov_b32_e32 v2, 0x4000			; GCN-NEXT: v_mov_b32_e32 v2, 0x4000
	; GCN-NEXT: v_mov_b32_e32 v3, 0			; GCN-NEXT: v_mov_b32_e32 v3, 0
	; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]
	; GCN-NEXT: v_mov_b32_e32 v4, 0x400000			; GCN-NEXT: v_mov_b32_e32 v4, 0x400000
	; GCN-NEXT: s_add_u32 s32, s33, 0xc0000			; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]
				; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]
				; GCN-NEXT: s_mov_b32 s32, 0xc0000
	; GCN-NEXT: v_add_nc_u32_e64 v32, 4, 0x4000			; GCN-NEXT: v_add_nc_u32_e64 v32, 4, 0x4000
	; GCN-NEXT: ; implicit-def: $vcc_hi			; GCN-NEXT: ; implicit-def: $vcc_hi
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+4
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s6			; GCN-NEXT: v_mov_b32_e32 v0, s6
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0			; GCN-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
	; GCN-NEXT: s_and_saveexec_b32 s0, vcc_lo			; GCN-NEXT: s_and_saveexec_b32 s0, vcc_lo
	; GCN-NEXT: s_cbranch_execz BB0_2			; GCN-NEXT: s_cbranch_execz BB0_2
	; GCN-NEXT: ; %bb.1: ; %if.then4.i			; GCN-NEXT: ; %bb.1: ; %if.then4.i
	; GCN-NEXT: buffer_load_dword v0, v32, s[36:39], s32 offen			; GCN-NEXT: buffer_load_dword v0, v32, s[36:39], 0 offen
	; GCN-NEXT: buffer_load_dword v1, v32, s[36:39], s32 offen offset:4			; GCN-NEXT: buffer_load_dword v1, v32, s[36:39], 0 offen offset:4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GCN-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GCN-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0			; GCN-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0
	; GCN-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0			; GCN-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0
	; GCN-NEXT: buffer_store_dword v0, v0, s[36:39], s33 offen			; GCN-NEXT: buffer_store_dword v0, v0, s[36:39], 0 offen
	; GCN-NEXT: BB0_2: ; %shader_eval_surface.exit			; GCN-NEXT: BB0_2: ; %shader_eval_surface.exit
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	entry:			entry:
	%sd = alloca < 1339 x i32>, align 16, addrspace(5)			%sd = alloca < 1339 x i32>, align 16, addrspace(5)
	%state = alloca <4 x i32>, align 4, addrspace(5)			%state = alloca <4 x i32>, align 4, addrspace(5)
	%rslt = call i32 @svm_eval_nodes(float addrspace(5)* %kg, <1339 x i32> addrspace(5)* %sd, <4 x i32> addrspace(5)* %state, i32 0, i32 4194304)			%rslt = call i32 @svm_eval_nodes(float addrspace(5)* %kg, <1339 x i32> addrspace(5)* %sd, <4 x i32> addrspace(5)* %state, i32 0, i32 4194304)
	%cmp = icmp eq i32 %rslt, 0			%cmp = icmp eq i32 %rslt, 0
	br i1 %cmp, label %shader_eval_surface.exit, label %if.then4.i			br i1 %cmp, label %shader_eval_surface.exit, label %if.then4.i
	Show All 18 Lines

llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s \| FileCheck -check-prefix=VI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s \| FileCheck -check-prefix=VI %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -check-prefix=GFX9 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -check-prefix=GFX9 %s

; Make sure the stack is never realigned for entry functions.		; Make sure the stack is never realigned for entry functions.

define amdgpu_kernel void @max_alignment_128() #0 {		define amdgpu_kernel void @max_alignment_128() #0 {
; VI-LABEL: max_alignment_128:		; VI-LABEL: max_alignment_128:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_u32 s4, s4, s7
		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
		; VI-NEXT: s_add_u32 s0, s0, s7
		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:128
; VI-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:128
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
; VI-NEXT: .section .rodata,#alloc		; VI-NEXT: .section .rodata,#alloc
; VI-NEXT: .p2align 6		; VI-NEXT: .p2align 6
; VI-NEXT: .amdhsa_kernel max_alignment_128		; VI-NEXT: .amdhsa_kernel max_alignment_128
; VI-NEXT: .amdhsa_group_segment_fixed_size 0		; VI-NEXT: .amdhsa_group_segment_fixed_size 0
; VI-NEXT: .amdhsa_private_segment_fixed_size 256		; VI-NEXT: .amdhsa_private_segment_fixed_size 256
; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 25 Lines
; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0		; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0
; VI-NEXT: .amdhsa_exception_int_div_zero 0		; VI-NEXT: .amdhsa_exception_int_div_zero 0
; VI-NEXT: .end_amdhsa_kernel		; VI-NEXT: .end_amdhsa_kernel
; VI-NEXT: .text		; VI-NEXT: .text
;		;
; GFX9-LABEL: max_alignment_128:		; GFX9-LABEL: max_alignment_128:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7		; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7
; GFX9-NEXT: v_mov_b32_e32 v0, 9
; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0		; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:128		; GFX9-NEXT: s_add_u32 s0, s0, s7
		; GFX9-NEXT: s_addc_u32 s1, s1, 0
		; GFX9-NEXT: v_mov_b32_e32 v0, 9
		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:128
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
; GFX9-NEXT: .section .rodata,#alloc		; GFX9-NEXT: .section .rodata,#alloc
; GFX9-NEXT: .p2align 6		; GFX9-NEXT: .p2align 6
; GFX9-NEXT: .amdhsa_kernel max_alignment_128		; GFX9-NEXT: .amdhsa_kernel max_alignment_128
; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0		; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0
; GFX9-NEXT: .amdhsa_private_segment_fixed_size 256		; GFX9-NEXT: .amdhsa_private_segment_fixed_size 256
; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 31 Lines	; GFX9-NEXT: .text
store volatile i32 9, i32 addrspace(5)* %alloca.align, align 128		store volatile i32 9, i32 addrspace(5)* %alloca.align, align 128
ret void		ret void
}		}

define amdgpu_kernel void @stackrealign_attr() #1 {		define amdgpu_kernel void @stackrealign_attr() #1 {
; VI-LABEL: stackrealign_attr:		; VI-LABEL: stackrealign_attr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_u32 s4, s4, s7
		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
		; VI-NEXT: s_add_u32 s0, s0, s7
		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; VI-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:4
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
; VI-NEXT: .section .rodata,#alloc		; VI-NEXT: .section .rodata,#alloc
; VI-NEXT: .p2align 6		; VI-NEXT: .p2align 6
; VI-NEXT: .amdhsa_kernel stackrealign_attr		; VI-NEXT: .amdhsa_kernel stackrealign_attr
; VI-NEXT: .amdhsa_group_segment_fixed_size 0		; VI-NEXT: .amdhsa_group_segment_fixed_size 0
; VI-NEXT: .amdhsa_private_segment_fixed_size 8		; VI-NEXT: .amdhsa_private_segment_fixed_size 8
; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 25 Lines
; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0		; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0
; VI-NEXT: .amdhsa_exception_int_div_zero 0		; VI-NEXT: .amdhsa_exception_int_div_zero 0
; VI-NEXT: .end_amdhsa_kernel		; VI-NEXT: .end_amdhsa_kernel
; VI-NEXT: .text		; VI-NEXT: .text
;		;
; GFX9-LABEL: stackrealign_attr:		; GFX9-LABEL: stackrealign_attr:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7		; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7
; GFX9-NEXT: v_mov_b32_e32 v0, 9
; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0		; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:4		; GFX9-NEXT: s_add_u32 s0, s0, s7
		; GFX9-NEXT: s_addc_u32 s1, s1, 0
		; GFX9-NEXT: v_mov_b32_e32 v0, 9
		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
; GFX9-NEXT: .section .rodata,#alloc		; GFX9-NEXT: .section .rodata,#alloc
; GFX9-NEXT: .p2align 6		; GFX9-NEXT: .p2align 6
; GFX9-NEXT: .amdhsa_kernel stackrealign_attr		; GFX9-NEXT: .amdhsa_kernel stackrealign_attr
; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0		; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0
; GFX9-NEXT: .amdhsa_private_segment_fixed_size 8		; GFX9-NEXT: .amdhsa_private_segment_fixed_size 8
; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 31 Lines	; GFX9-NEXT: .text
store volatile i32 9, i32 addrspace(5)* %alloca.align, align 4		store volatile i32 9, i32 addrspace(5)* %alloca.align, align 4
ret void		ret void
}		}

define amdgpu_kernel void @alignstack_attr() #2 {		define amdgpu_kernel void @alignstack_attr() #2 {
; VI-LABEL: alignstack_attr:		; VI-LABEL: alignstack_attr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_u32 s4, s4, s7
		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
		; VI-NEXT: s_add_u32 s0, s0, s7
		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; VI-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:4
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
; VI-NEXT: .section .rodata,#alloc		; VI-NEXT: .section .rodata,#alloc
; VI-NEXT: .p2align 6		; VI-NEXT: .p2align 6
; VI-NEXT: .amdhsa_kernel alignstack_attr		; VI-NEXT: .amdhsa_kernel alignstack_attr
; VI-NEXT: .amdhsa_group_segment_fixed_size 0		; VI-NEXT: .amdhsa_group_segment_fixed_size 0
; VI-NEXT: .amdhsa_private_segment_fixed_size 128		; VI-NEXT: .amdhsa_private_segment_fixed_size 128
; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 25 Lines
; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0		; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0
; VI-NEXT: .amdhsa_exception_int_div_zero 0		; VI-NEXT: .amdhsa_exception_int_div_zero 0
; VI-NEXT: .end_amdhsa_kernel		; VI-NEXT: .end_amdhsa_kernel
; VI-NEXT: .text		; VI-NEXT: .text
;		;
; GFX9-LABEL: alignstack_attr:		; GFX9-LABEL: alignstack_attr:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7		; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7
; GFX9-NEXT: v_mov_b32_e32 v0, 9
; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0		; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:4		; GFX9-NEXT: s_add_u32 s0, s0, s7
		; GFX9-NEXT: s_addc_u32 s1, s1, 0
		; GFX9-NEXT: v_mov_b32_e32 v0, 9
		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
; GFX9-NEXT: .section .rodata,#alloc		; GFX9-NEXT: .section .rodata,#alloc
; GFX9-NEXT: .p2align 6		; GFX9-NEXT: .p2align 6
; GFX9-NEXT: .amdhsa_kernel alignstack_attr		; GFX9-NEXT: .amdhsa_kernel alignstack_attr
; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0		; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0
; GFX9-NEXT: .amdhsa_private_segment_fixed_size 128		; GFX9-NEXT: .amdhsa_private_segment_fixed_size 128
; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 38 Lines

llvm/test/CodeGen/AMDGPU/stack-realign.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; Check that we properly realign the stack. While 4-byte access is all			; Check that we properly realign the stack. While 4-byte access is all
	; that is ever needed, some transformations rely on the known bits from the alignment of the pointer (e.g.			; that is ever needed, some transformations rely on the known bits from the alignment of the pointer (e.g.


	; 128 byte object			; 128 byte object
	; 4 byte emergency stack slot			; 4 byte emergency stack slot
	; = 144 bytes with padding between them			; = 144 bytes with padding between them

	; GCN-LABEL: {{^}}needs_align16_default_stack_align:			; GCN-LABEL: {{^}}needs_align16_default_stack_align:
	; GCN: s_sub_u32 [[SUB:s[0-9]+]], s32, s33
	; GCN-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, v0			; GCN-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, v0
	; GCN-DAG: v_lshrrev_b32_e64 [[FRAMEDIFF:v[0-9]+]], 6, [[SUB]]			; GCN-DAG: v_lshrrev_b32_e64 [[FRAMEDIFF:v[0-9]+]], 6, s32
	; GCN: v_add_u32_e32 [[FI:v[0-9]+]], vcc, [[FRAMEDIFF]], [[SCALED_IDX]]			; GCN: v_add_u32_e32 [[FI:v[0-9]+]], vcc, [[FRAMEDIFF]], [[SCALED_IDX]]

	; GCN-NOT: s32			; GCN-NOT: s32

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: v_or_b32_e32 v{{[0-9]+}}, 12			; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

	; GCN-NOT: s32			; GCN-NOT: s32

	; GCN: ; ScratchSize: 144			; GCN: ; ScratchSize: 144
	define void @needs_align16_default_stack_align(i32 %idx) #0 {			define void @needs_align16_default_stack_align(i32 %idx) #0 {
	%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)			%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}needs_align16_stack_align4:			; GCN-LABEL: {{^}}needs_align16_stack_align4:
	; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x3c0{{$}}			; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x3c0{{$}}
	; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xfffffc00			; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xfffffc00
	; GCN: s_add_u32 s32, s32, 0x2800{{$}}

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: v_or_b32_e32 v{{[0-9]+}}, 12			; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: s_add_u32 s32, s32, 0x2800{{$}}
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
				; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

	; GCN: s_sub_u32 s32, s32, 0x2800			; GCN: s_sub_u32 s32, s32, 0x2800

	; GCN: ; ScratchSize: 160			; GCN: ; ScratchSize: 160
	define void @needs_align16_stack_align4(i32 %idx) #2 {			define void @needs_align16_stack_align4(i32 %idx) #2 {
	%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)			%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}needs_align32:			; GCN-LABEL: {{^}}needs_align32:
	; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x7c0{{$}}			; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x7c0{{$}}
	; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xfffff800			; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xfffff800
	; GCN: s_add_u32 s32, s32, 0x3000{{$}}

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: v_or_b32_e32 v{{[0-9]+}}, 12			; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: s_add_u32 s32, s32, 0x3000{{$}}
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
				; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

	; GCN: s_sub_u32 s32, s32, 0x3000			; GCN: s_sub_u32 s32, s32, 0x3000

	; GCN: ; ScratchSize: 192			; GCN: ; ScratchSize: 192
	define void @needs_align32(i32 %idx) #0 {			define void @needs_align32(i32 %idx) #0 {
	%alloca.align16 = alloca [8 x <4 x i32>], align 32, addrspace(5)			%alloca.align16 = alloca [8 x <4 x i32>], align 32, addrspace(5)
	%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 32			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 32
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}force_realign4:			; GCN-LABEL: {{^}}force_realign4:
	; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0xc0{{$}}			; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0xc0{{$}}
	; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xffffff00			; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xffffff00
	; GCN: s_add_u32 s32, s32, 0xd00{{$}}			; GCN: s_add_u32 s32, s32, 0xd00{{$}}

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: s_sub_u32 s32, s32, 0xd00			; GCN: s_sub_u32 s32, s32, 0xd00

	; GCN: ; ScratchSize: 52			; GCN: ; ScratchSize: 52
	define void @force_realign4(i32 %idx) #1 {			define void @force_realign4(i32 %idx) #1 {
	%alloca.align16 = alloca [8 x i32], align 4, addrspace(5)			%alloca.align16 = alloca [8 x i32], align 4, addrspace(5)
	%gep0 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca.align16, i32 0, i32 %idx
	store volatile i32 3, i32 addrspace(5)* %gep0, align 4			store volatile i32 3, i32 addrspace(5)* %gep0, align 4
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}kernel_call_align16_from_8:			; GCN-LABEL: {{^}}kernel_call_align16_from_8:
	; GCN: s_mov_b32 s33, s7{{$}}			; GCN: s_movk_i32 s32, 0x400{{$}}
	; GCN-NEXT: s_add_u32 s32, s33, 0x400{{$}}
	; GCN-NOT: s32			; GCN-NOT: s32
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kernel_call_align16_from_8() #0 {			define amdgpu_kernel void @kernel_call_align16_from_8() #0 {
	%alloca = alloca i32, align 4, addrspace(5)			%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 2, i32 addrspace(5)* %alloca			store volatile i32 2, i32 addrspace(5)* %alloca
	call void @needs_align16_default_stack_align(i32 1)			call void @needs_align16_default_stack_align(i32 1)
	ret void			ret void
	}			}

	; The call sequence should keep the stack on call aligned to 4			; The call sequence should keep the stack on call aligned to 4
	; GCN-LABEL: {{^}}kernel_call_align16_from_5:			; GCN-LABEL: {{^}}kernel_call_align16_from_5:
	; GCN: s_mov_b32 s33, s7{{$}}			; GCN: s_movk_i32 s32, 0x400
	; GCN-NEXT: s_add_u32 s32, s33, 0x400
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kernel_call_align16_from_5() {			define amdgpu_kernel void @kernel_call_align16_from_5() {
	%alloca0 = alloca i8, align 1, addrspace(5)			%alloca0 = alloca i8, align 1, addrspace(5)
	store volatile i8 2, i8 addrspace(5)* %alloca0			store volatile i8 2, i8 addrspace(5)* %alloca0

	call void @needs_align16_default_stack_align(i32 1)			call void @needs_align16_default_stack_align(i32 1)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}kernel_call_align4_from_5:			; GCN-LABEL: {{^}}kernel_call_align4_from_5:
	; GCN: s_mov_b32 s33, s7{{$}}			; GCN: s_movk_i32 s32, 0x400
	; GCN: s_add_u32 s32, s33, 0x400
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kernel_call_align4_from_5() {			define amdgpu_kernel void @kernel_call_align4_from_5() {
	%alloca0 = alloca i8, align 1, addrspace(5)			%alloca0 = alloca i8, align 1, addrspace(5)
	store volatile i8 2, i8 addrspace(5)* %alloca0			store volatile i8 2, i8 addrspace(5)* %alloca0

	call void @needs_align16_stack_align4(i32 1)			call void @needs_align16_stack_align4(i32 1)
	ret void			ret void
	}			}
	Show All 30 Lines

llvm/test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir

	# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s
	---			---

	# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}			# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}
	# CHECK: stack:			# CHECK: stack:
	# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: default,			# CHECK-NEXT: stack-id: default,

	# CHECK: - { id: 1, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# CHECK: - { id: 1, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: sgpr-spill,			# CHECK-NEXT: stack-id: sgpr-spill,

	# CHECK: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# CHECK: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# CHECK: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# CHECK: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)

	# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr6, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.1, addrspace 5)			# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.1, addrspace 5)
	# CHECK: $sgpr6 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.1, addrspace 5)			# CHECK: $sgpr5 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.1, addrspace 5)

	name: no_merge_sgpr_vgpr_spill_slot			name: no_merge_sgpr_vgpr_spill_slot
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4			frameOffsetReg: $sgpr4
	frameOffsetReg: $sgpr5
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	%0:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, 0, implicit $flat_scr, implicit $exec			%0:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, 0, implicit $flat_scr, implicit $exec
	%2:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, 0, implicit $flat_scr, implicit $exec			%2:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, 0, implicit $flat_scr, implicit $exec
	S_NOP 0, implicit %0			S_NOP 0, implicit %0
	%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0			%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0
	%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0			%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0
	S_NOP 0, implicit %1			S_NOP 0, implicit %1
	...			...

llvm/test/CodeGen/AMDGPU/store-hi16.ll

Show First 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	entry:
%gep = getelementptr inbounds i8, i8* %out, i64 -4095		%gep = getelementptr inbounds i8, i8* %out, i64 -4095
store i8 %trunc, i8* %gep		store i8 %trunc, i8* %gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16:		; GCN-LABEL: {{^}}store_private_hi_v2i16:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI: buffer_store_short v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16(i16 addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2i16(i16 addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
store i16 %hi, i16 addrspace(5)* %out		store i16 %hi, i16 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2f16:		; GCN-LABEL: {{^}}store_private_hi_v2f16:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI: buffer_store_short v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2f16(half addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2f16(half addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x half>		%value = bitcast i32 %arg to <2 x half>
%hi = extractelement <2 x half> %value, i32 1		%hi = extractelement <2 x half> %value, i32 1
store half %hi, half addrspace(5)* %out		store half %hi, half addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_i32_shift:		; GCN-LABEL: {{^}}store_private_hi_i32_shift:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_short v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_i32_shift(i16 addrspace(5)* %out, i32 %value) #0 {		define void @store_private_hi_i32_shift(i16 addrspace(5)* %out, i32 %value) #0 {
entry:		entry:
%hi32 = lshr i32 %value, 16		%hi32 = lshr i32 %value, 16
%hi = trunc i32 %hi32 to i16		%hi = trunc i32 %hi32 to i16
store i16 %hi, i16 addrspace(5)* %out		store i16 %hi, i16 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16_i8:		; GCN-LABEL: {{^}}store_private_hi_v2i16_i8:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_i8(i8 addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2i16_i8(i8 addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%trunc = trunc i16 %hi to i8		%trunc = trunc i16 %hi to i8
store i8 %trunc, i8 addrspace(5)* %out		store i8 %trunc, i8 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_i8_shift:		; GCN-LABEL: {{^}}store_private_hi_i8_shift:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_i8_shift(i8 addrspace(5)* %out, i32 %value) #0 {		define void @store_private_hi_i8_shift(i8 addrspace(5)* %out, i32 %value) #0 {
entry:		entry:
%hi32 = lshr i32 %value, 16		%hi32 = lshr i32 %value, 16
%hi = trunc i32 %hi32 to i8		%hi = trunc i32 %hi32 to i8
store i8 %hi, i8 addrspace(5)* %out		store i8 %hi, i8 addrspace(5)* %out
Show All 18 Lines	entry:
ret void		ret void
}		}



; GCN-LABEL: {{^}}store_private_hi_v2i16_nooff:		; GCN-LABEL: {{^}}store_private_hi_v2i16_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v0, off, s[0:3], s33{{$}}		; GFX900-NEXT: buffer_store_short_d16_hi v0, off, s[0:3], 0{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
; NO-D16-HI-NEXT: buffer_store_short v0, off, s[0:3], s33{{$}}		; NO-D16-HI-NEXT: buffer_store_short v0, off, s[0:3], 0{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_nooff(i32 %arg) #0 {		define void @store_private_hi_v2i16_nooff(i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
store volatile i16 %hi, i16 addrspace(5)* null		store volatile i16 %hi, i16 addrspace(5)* null
ret void		ret void
}		}


; GCN-LABEL: {{^}}store_private_hi_v2i16_i8_nooff:		; GCN-LABEL: {{^}}store_private_hi_v2i16_i8_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v0, off, s[0:3], s33{{$}}		; GFX900-NEXT: buffer_store_byte_d16_hi v0, off, s[0:3], 0{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v0, 16, v0		; NO-D16-HI: v_lshrrev_b32_e32 v0, 16, v0
; NO-D16-HI: buffer_store_byte v0, off, s[0:3], s33{{$}}		; NO-D16-HI: buffer_store_byte v0, off, s[0:3], 0{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_i8_nooff(i32 %arg) #0 {		define void @store_private_hi_v2i16_i8_nooff(i32 %arg) #0 {
entry:		entry:
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%trunc = trunc i16 %hi to i8		%trunc = trunc i16 %hi to i8
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/subreg-split-live-in-error.mir

	Show All 35 Lines
	#			#
	# This test exposes this scenario which caused previously caused an assert			# This test exposes this scenario which caused previously caused an assert

	---			---
	name: _amdgpu_ps_main			name: _amdgpu_ps_main
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	liveins:			liveins:
	- { reg: '$vgpr2', virtual-reg: '%0' }			- { reg: '$vgpr2', virtual-reg: '%0' }
	- { reg: '$vgpr3', virtual-reg: '%1' }			- { reg: '$vgpr3', virtual-reg: '%1' }
	- { reg: '$vgpr4', virtual-reg: '%2' }			- { reg: '$vgpr4', virtual-reg: '%2' }
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1(0x40000000), %bb.2(0x40000000)			successors: %bb.1(0x40000000), %bb.2(0x40000000)
	▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/subvector-test.mir

	# RUN: llc -march=amdgcn -mcpu=gfx1010 -start-before=greedy -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -start-before=greedy -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s
	...			...
	# GCN-LABEL: {{^}}"subvector-basic-bb"			# GCN-LABEL: {{^}}"subvector-basic-bb"
	# GCN: s_subvector_loop_begin [[RS:s[0-9]]], BB0_2			# GCN: s_subvector_loop_begin [[RS:s[0-9]]], BB0_2
	# GCN: s_subvector_loop_end [[RS]], BB0_1			# GCN: s_subvector_loop_end [[RS]], BB0_1
	name: subvector-basic-bb			name: subvector-basic-bb
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	frameOffsetReg: $sgpr5			frameOffsetReg: $sgpr5
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr0_sgpr1			liveins: $sgpr0_sgpr1
	successors: %bb.1, %bb.2			successors: %bb.1, %bb.2

	%1:sgpr_64 = COPY $sgpr0_sgpr1			%1:sgpr_64 = COPY $sgpr0_sgpr1
	Show All 19 Lines

llvm/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll

	Show All 16 Lines
	; GCN-DAG: s_mov_b32 s[[DESC0:[0-9]+]], SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s[[DESC0:[0-9]+]], SCRATCH_RSRC_DWORD0
	; GCN-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1			; GCN-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
	; GCN-DAG: s_mov_b32 s{{[0-9]+}}, -1			; GCN-DAG: s_mov_b32 s{{[0-9]+}}, -1
	; SI-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe8f000			; SI-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe8f000
	; VI-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe80000			; VI-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe80000
	; GFX9-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe00000			; GFX9-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe00000

	; OFFREG is offset system SGPR			; OFFREG is offset system SGPR
	; GCN: buffer_store_dword {{v[0-9]+}}, off, s{{\[}}[[DESC0]]:[[DESC3]]], s12 offset:{{[0-9]+}} ; 4-byte Folded Spill			; GCN: buffer_store_dword {{v[0-9]+}}, off, s{{\[}}[[DESC0]]:[[DESC3]]], 0 offset:{{[0-9]+}} ; 4-byte Folded Spill
	; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[DESC0]]:[[DESC3]]], s12 offset:{{[0-9]+}} ; 4-byte Folded Reload			; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[DESC0]]:[[DESC3]]], 0 offset:{{[0-9]+}} ; 4-byte Folded Reload
	; GCN: NumVgprs: 256			; GCN: NumVgprs: 256
	; GCN: ScratchSize: 1536			; GCN: ScratchSize: 1536

	define amdgpu_vs void @main([9 x <4 x i32>] addrspace(4)* inreg %arg, [17 x <4 x i32>] addrspace(4)* inreg %arg1, [17 x <4 x i32>] addrspace(4)* inreg %arg2, [34 x <8 x i32>] addrspace(4)* inreg %arg3, [16 x <4 x i32>] addrspace(4)* inreg %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10) #0 {			define amdgpu_vs void @main([9 x <4 x i32>] addrspace(4)* inreg %arg, [17 x <4 x i32>] addrspace(4)* inreg %arg1, [17 x <4 x i32>] addrspace(4)* inreg %arg2, [34 x <8 x i32>] addrspace(4)* inreg %arg3, [16 x <4 x i32>] addrspace(4)* inreg %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10) #0 {
	bb:			bb:
	%tmp = getelementptr [17 x <4 x i32>], [17 x <4 x i32>] addrspace(4)* %arg1, i64 0, i64 0			%tmp = getelementptr [17 x <4 x i32>], [17 x <4 x i32>] addrspace(4)* %arg1, i64 0, i64 0
	%tmp11 = load <4 x i32>, <4 x i32> addrspace(4)* %tmp, align 16, !tbaa !0			%tmp11 = load <4 x i32>, <4 x i32> addrspace(4)* %tmp, align 16, !tbaa !0
	%tmp12 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %tmp11, i32 0, i32 0)			%tmp12 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %tmp11, i32 0, i32 0)
	▲ Show 20 Lines • Show All 466 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir

	Show All 18 Lines
	name: undef_identity_copy			name: undef_identity_copy
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	hasCalls: true			hasCalls: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr95'
	frameOffsetReg: '$sgpr95'			frameOffsetReg: '$sgpr95'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: undef_identity_copy			; CHECK-LABEL: name: undef_identity_copy
	; CHECK: renamable $vgpr32_vgpr33_vgpr34_vgpr35 = FLAT_LOAD_DWORDX4 undef renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)			; CHECK: renamable $vgpr32_vgpr33_vgpr34_vgpr35 = FLAT_LOAD_DWORDX4 undef renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)
	; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc			; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc
	; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95			; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95
	Show All 34 Lines

llvm/test/CodeGen/AMDGPU/wqm.ll

	Show First 20 Lines • Show All 687 Lines • ▼ Show 20 Lines
	;			;
	; CHECK-LABEL: {{^}}test_alloca:			; CHECK-LABEL: {{^}}test_alloca:
	; CHECK: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec			; CHECK: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec

	; CHECK: s_and_b64 exec, exec, [[LIVE]]			; CHECK: s_and_b64 exec, exec, [[LIVE]]
	; CHECK: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0			; CHECK: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec
	; CHECK: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offset:4{{$}}			; CHECK: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
	; CHECK: s_and_b64 exec, exec, [[LIVE]]			; CHECK: s_and_b64 exec, exec, [[LIVE]]
	; CHECK: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 idxen			; CHECK: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 idxen
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec
	; CHECK: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen			; CHECK: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; CHECK: s_and_b64 exec, exec, [[LIVE]]			; CHECK: s_and_b64 exec, exec, [[LIVE]]
	; CHECK: image_sample			; CHECK: image_sample
	; CHECK: buffer_store_dwordx4			; CHECK: buffer_store_dwordx4
	define amdgpu_ps void @test_alloca(float %data, i32 %a, i32 %idx) nounwind {			define amdgpu_ps void @test_alloca(float %data, i32 %a, i32 %idx) nounwind {
	entry:			entry:
	%array = alloca [32 x i32], align 4, addrspace(5)			%array = alloca [32 x i32], align 4, addrspace(5)

	▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved.ll

Show All 38 Lines	entry:
%tmp100 = call <2 x float> @llvm.amdgcn.raw.buffer.load.v2f32(<4 x i32> %tmp14, i32 0, i32 0, i32 0)		%tmp100 = call <2 x float> @llvm.amdgcn.raw.buffer.load.v2f32(<4 x i32> %tmp14, i32 0, i32 0, i32 0)
%tmp101 = bitcast <2 x float> %tmp100 to <2 x i32>		%tmp101 = bitcast <2 x float> %tmp100 to <2 x i32>
%tmp102 = extractelement <2 x i32> %tmp101, i32 0		%tmp102 = extractelement <2 x i32> %tmp101, i32 0
%tmp105 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp102, i32 0)		%tmp105 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp102, i32 0)

; GFX9: v_mov_b32_dpp v[[FIRST_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9: v_mov_b32_dpp v[[FIRST_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9: v_add_u32_e32 v[[FIRST_ADD:[0-9]+]], v{{[0-9]+}}, v[[FIRST_MOV]]		; GFX9: v_add_u32_e32 v[[FIRST_ADD:[0-9]+]], v{{[0-9]+}}, v[[FIRST_MOV]]
; GFX9: v_mov_b32_e32 v[[FIRST:[0-9]+]], v[[FIRST_ADD]]		; GFX9: v_mov_b32_e32 v[[FIRST:[0-9]+]], v[[FIRST_ADD]]
; GFX9-O0: buffer_store_dword v[[FIRST]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, s[[FIRST_SGPR_OFFSET:[0-9]+]] offset:[[FIRST_IMM_OFFSET:[0-9]+]]		; GFX9-O0: buffer_store_dword v[[FIRST]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, 0 offset:[[FIRST_IMM_OFFSET:[0-9]+]]
%tmp120 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp105, i32 323, i32 12, i32 15, i1 false)		%tmp120 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp105, i32 323, i32 12, i32 15, i1 false)
%tmp121 = add i32 %tmp105, %tmp120		%tmp121 = add i32 %tmp105, %tmp120
%tmp122 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp121)		%tmp122 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp121)

%cond = icmp eq i32 %arg, 0		%cond = icmp eq i32 %arg, 0
br i1 %cond, label %if, label %merge		br i1 %cond, label %if, label %merge
if:		if:
%tmp103 = extractelement <2 x i32> %tmp101, i32 1		%tmp103 = extractelement <2 x i32> %tmp101, i32 1
%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp103, i32 0)		%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp103, i32 0)

; GFX9: v_mov_b32_dpp v[[SECOND_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9: v_mov_b32_dpp v[[SECOND_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9: v_add_u32_e32 v[[SECOND_ADD:[0-9]+]], v{{[0-9]+}}, v[[SECOND_MOV]]		; GFX9: v_add_u32_e32 v[[SECOND_ADD:[0-9]+]], v{{[0-9]+}}, v[[SECOND_MOV]]
; GFX9: v_mov_b32_e32 v[[SECOND:[0-9]+]], v[[SECOND_ADD]]		; GFX9: v_mov_b32_e32 v[[SECOND:[0-9]+]], v[[SECOND_ADD]]
; GFX9-O0: buffer_store_dword v[[SECOND]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, s[[SECOND_SGPR_OFFSET:[0-9]+]] offset:[[SECOND_IMM_OFFSET:[0-9]+]]		; GFX9-O0: buffer_store_dword v[[SECOND]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, 0 offset:[[SECOND_IMM_OFFSET:[0-9]+]]
%tmp135 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp107, i32 323, i32 12, i32 15, i1 false)		%tmp135 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp107, i32 323, i32 12, i32 15, i1 false)
%tmp136 = add i32 %tmp107, %tmp135		%tmp136 = add i32 %tmp107, %tmp135
%tmp137 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp136)		%tmp137 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp136)
br label %merge		br label %merge

merge:		merge:
%merge_value = phi i32 [ 0, %entry ], [%tmp137, %if ]		%merge_value = phi i32 [ 0, %entry ], [%tmp137, %if ]
; GFX9-O3: v_cmp_eq_u32_e32 vcc, v[[FIRST]], v[[SECOND]]		; GFX9-O3: v_cmp_eq_u32_e32 vcc, v[[FIRST]], v[[SECOND]]
; GFX9-O0: buffer_load_dword v[[SECOND:[0-9]+]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, s[[SECOND_SGPR_OFFSET]] offset:[[SECOND_IMM_OFFSET]]		; GFX9-O0: buffer_load_dword v[[SECOND:[0-9]+]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, 0 offset:[[SECOND_IMM_OFFSET]]
; GFX9-O0: buffer_load_dword v[[FIRST:[0-9]+]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, s[[FIRST_SGPR_OFFSET]] offset:[[FIRST_IMM_OFFSET]]		; GFX9-O0: buffer_load_dword v[[FIRST:[0-9]+]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, 0 offset:[[FIRST_IMM_OFFSET]]
; GFX9-O0: v_cmp_eq_u32_e64 s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, v[[FIRST]], v[[SECOND]]		; GFX9-O0: v_cmp_eq_u32_e64 s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, v[[FIRST]], v[[SECOND]]
%tmp138 = icmp eq i32 %tmp122, %merge_value		%tmp138 = icmp eq i32 %tmp122, %merge_value
%tmp139 = sext i1 %tmp138 to i32		%tmp139 = sext i1 %tmp138 to i32
%tmp140 = shl nsw i32 %tmp139, 1		%tmp140 = shl nsw i32 %tmp139, 1
%tmp141 = and i32 %tmp140, 2		%tmp141 = and i32 %tmp140, 2
%tmp145 = bitcast i32 %tmp141 to float		%tmp145 = bitcast i32 %tmp141 to float
call void @llvm.amdgcn.raw.buffer.store.f32(float %tmp145, <4 x i32> %tmp14, i32 4, i32 0, i32 0)		call void @llvm.amdgcn.raw.buffer.store.f32(float %tmp145, <4 x i32> %tmp14, i32 4, i32 0, i32 0)
ret void		ret void
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=FULL,ALL %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=FULL,ALL %s
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -simplify-mir -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=SIMPLE,ALL %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -simplify-mir -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=SIMPLE,ALL %s


	---			---
	# ALL-LABEL: name: kernel0			# ALL-LABEL: name: kernel0
	# FULL: machineFunctionInfo:			# FULL: machineFunctionInfo:
	# FULL-NEXT: explicitKernArgSize: 128			# FULL-NEXT: explicitKernArgSize: 128
	# FULL-NEXT: maxKernArgAlign: 64			# FULL-NEXT: maxKernArgAlign: 64
	# FULL-NEXT: ldsSize: 2048			# FULL-NEXT: ldsSize: 2048
	# FULL-NEXT: isEntryFunction: true			# FULL-NEXT: isEntryFunction: true
	# FULL-NEXT: noSignedZerosFPMath: false			# FULL-NEXT: noSignedZerosFPMath: false
	# FULL-NEXT: memoryBound: true			# FULL-NEXT: memoryBound: true
	# FULL-NEXT: waveLimiter: true			# FULL-NEXT: waveLimiter: true
	# FULL-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'			# FULL-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'
	# FULL-NEXT: scratchWaveOffsetReg: '$sgpr12'
	# FULL-NEXT: frameOffsetReg: '$sgpr12'			# FULL-NEXT: frameOffsetReg: '$sgpr12'
	# FULL-NEXT: stackPtrOffsetReg: '$sgpr13'			# FULL-NEXT: stackPtrOffsetReg: '$sgpr13'
	# FULL-NEXT: argumentInfo:			# FULL-NEXT: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			# FULL-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	# FULL-NEXT: workGroupIDX: { reg: '$sgpr6' }			# FULL-NEXT: workGroupIDX: { reg: '$sgpr6' }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }			# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	# FULL-NEXT: workItemIDX: { reg: '$vgpr0' }			# FULL-NEXT: workItemIDX: { reg: '$vgpr0' }
	Show All 10 Lines
	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: explicitKernArgSize: 128			# SIMPLE-NEXT: explicitKernArgSize: 128
	# SIMPLE-NEXT: maxKernArgAlign: 64			# SIMPLE-NEXT: maxKernArgAlign: 64
	# SIMPLE-NEXT: ldsSize: 2048			# SIMPLE-NEXT: ldsSize: 2048
	# SIMPLE-NEXT: isEntryFunction: true			# SIMPLE-NEXT: isEntryFunction: true
	# SIMPLE-NEXT: memoryBound: true			# SIMPLE-NEXT: memoryBound: true
	# SIMPLE-NEXT: waveLimiter: true			# SIMPLE-NEXT: waveLimiter: true
	# SIMPLE-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'			# SIMPLE-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'
	# SIMPLE-NEXT: scratchWaveOffsetReg: '$sgpr12'
	# SIMPLE-NEXT: frameOffsetReg: '$sgpr12'			# SIMPLE-NEXT: frameOffsetReg: '$sgpr12'
	# SIMPLE-NEXT: stackPtrOffsetReg: '$sgpr13'			# SIMPLE-NEXT: stackPtrOffsetReg: '$sgpr13'
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			# SIMPLE-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	# SIMPLE-NEXT: workGroupIDX: { reg: '$sgpr6' }			# SIMPLE-NEXT: workGroupIDX: { reg: '$sgpr6' }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }			# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	# SIMPLE-NEXT: workItemIDX: { reg: '$vgpr0' }			# SIMPLE-NEXT: workItemIDX: { reg: '$vgpr0' }
	# SIMPLE-NEXT: body:			# SIMPLE-NEXT: body:
	name: kernel0			name: kernel0
	machineFunctionInfo:			machineFunctionInfo:
	explicitKernArgSize: 128			explicitKernArgSize: 128
	maxKernArgAlign: 64			maxKernArgAlign: 64
	ldsSize: 2048			ldsSize: 2048
	isEntryFunction: true			isEntryFunction: true
	noSignedZerosFPMath: false			noSignedZerosFPMath: false
	memoryBound: true			memoryBound: true
	waveLimiter: true			waveLimiter: true
	scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'			scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'
	scratchWaveOffsetReg: '$sgpr12'
	frameOffsetReg: '$sgpr12'			frameOffsetReg: '$sgpr12'
	stackPtrOffsetReg: '$sgpr13'			stackPtrOffsetReg: '$sgpr13'
	argumentInfo:			argumentInfo:
	privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	workGroupIDX: { reg: '$sgpr6' }			workGroupIDX: { reg: '$sgpr6' }
	privateSegmentWaveByteOffset: { reg: '$sgpr7' }			privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	workItemIDX: { reg: '$vgpr0' }			workItemIDX: { reg: '$vgpr0' }
	Show All 10 Lines
	# FULL-NEXT: explicitKernArgSize: 0			# FULL-NEXT: explicitKernArgSize: 0
	# FULL-NEXT: maxKernArgAlign: 1			# FULL-NEXT: maxKernArgAlign: 1
	# FULL-NEXT: ldsSize: 0			# FULL-NEXT: ldsSize: 0
	# FULL-NEXT: isEntryFunction: false			# FULL-NEXT: isEntryFunction: false
	# FULL-NEXT: noSignedZerosFPMath: false			# FULL-NEXT: noSignedZerosFPMath: false
	# FULL-NEXT: memoryBound: false			# FULL-NEXT: memoryBound: false
	# FULL-NEXT: waveLimiter: false			# FULL-NEXT: waveLimiter: false
	# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'			# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
	# FULL-NEXT: scratchWaveOffsetReg: '$scratch_wave_offset_reg'
	# FULL-NEXT: frameOffsetReg: '$fp_reg'			# FULL-NEXT: frameOffsetReg: '$fp_reg'
	# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'			# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'
	# FULL-NEXT: argumentInfo:			# FULL-NEXT: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# FULL-NEXT: mode:			# FULL-NEXT: mode:
	# FULL-NEXT: ieee: true			# FULL-NEXT: ieee: true
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# SIMPLE-NEXT: body:			# SIMPLE-NEXT: body:

	name: no_mfi			name: no_mfi
	body: \|			body: \|
	bb.0:			bb.0:
	S_ENDPGM 0			S_ENDPGM 0

	...			...

	---			---
	# ALL-LABEL: name: empty_mfi			# ALL-LABEL: name: empty_mfi
	# FULL: machineFunctionInfo:			# FULL: machineFunctionInfo:
	# FULL-NEXT: explicitKernArgSize: 0			# FULL-NEXT: explicitKernArgSize: 0
	# FULL-NEXT: maxKernArgAlign: 1			# FULL-NEXT: maxKernArgAlign: 1
	# FULL-NEXT: ldsSize: 0			# FULL-NEXT: ldsSize: 0
	# FULL-NEXT: isEntryFunction: false			# FULL-NEXT: isEntryFunction: false
	# FULL-NEXT: noSignedZerosFPMath: false			# FULL-NEXT: noSignedZerosFPMath: false
	# FULL-NEXT: memoryBound: false			# FULL-NEXT: memoryBound: false
	# FULL-NEXT: waveLimiter: false			# FULL-NEXT: waveLimiter: false
	# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'			# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
	# FULL-NEXT: scratchWaveOffsetReg: '$scratch_wave_offset_reg'
	# FULL-NEXT: frameOffsetReg: '$fp_reg'			# FULL-NEXT: frameOffsetReg: '$fp_reg'
	# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'			# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'
	# FULL-NEXT: argumentInfo:			# FULL-NEXT: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# FULL-NEXT: mode:			# FULL-NEXT: mode:
	# FULL-NEXT: ieee: true			# FULL-NEXT: ieee: true
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# SIMPLE-NEXT: body:			# SIMPLE-NEXT: body:

	name: empty_mfi			name: empty_mfi
	machineFunctionInfo:			machineFunctionInfo:
	body: \|			body: \|
	bb.0:			bb.0:
	S_ENDPGM 0			S_ENDPGM 0

	...			...

	---			---
	# ALL-LABEL: name: empty_mfi_entry_func			# ALL-LABEL: name: empty_mfi_entry_func
	# FULL: machineFunctionInfo:			# FULL: machineFunctionInfo:
	# FULL-NEXT: explicitKernArgSize: 0			# FULL-NEXT: explicitKernArgSize: 0
	# FULL-NEXT: maxKernArgAlign: 1			# FULL-NEXT: maxKernArgAlign: 1
	# FULL-NEXT: ldsSize: 0			# FULL-NEXT: ldsSize: 0
	# FULL-NEXT: isEntryFunction: true			# FULL-NEXT: isEntryFunction: true
	# FULL-NEXT: noSignedZerosFPMath: false			# FULL-NEXT: noSignedZerosFPMath: false
	# FULL-NEXT: memoryBound: false			# FULL-NEXT: memoryBound: false
	# FULL-NEXT: waveLimiter: false			# FULL-NEXT: waveLimiter: false
	# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'			# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
	# FULL-NEXT: scratchWaveOffsetReg: '$scratch_wave_offset_reg'
	# FULL-NEXT: frameOffsetReg: '$fp_reg'			# FULL-NEXT: frameOffsetReg: '$fp_reg'
	# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'			# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'
	# FULL-NEXT: argumentInfo:			# FULL-NEXT: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# FULL-NEXT: mode:			# FULL-NEXT: mode:
	# FULL-NEXT: ieee: true			# FULL-NEXT: ieee: true
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: isEntryFunction: true			# SIMPLE-NEXT: isEntryFunction: true
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# SIMPLE-NEXT: body:			# SIMPLE-NEXT: body:

	name: empty_mfi_entry_func			name: empty_mfi_entry_func
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	S_ENDPGM 0			S_ENDPGM 0

	...			...

	---			---
	# ALL-LABEL: name: default_regs_mfi			# ALL-LABEL: name: default_regs_mfi

	# FULL: scratchRSrcReg: '$private_rsrc_reg'			# FULL: scratchRSrcReg: '$private_rsrc_reg'
	# FULL-NEXT: scratchWaveOffsetReg: '$scratch_wave_offset_reg'
	# FULL-NEXT: frameOffsetReg: '$fp_reg'			# FULL-NEXT: frameOffsetReg: '$fp_reg'
	# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'			# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'

	# SIMPLE-NOT: scratchRSrcReg			# SIMPLE-NOT: scratchRSrcReg
	# SIMPLE-NOT: scratchWaveOffsetReg
	# SIMPLE-NOT:: stackPtrOffsetReg			# SIMPLE-NOT:: stackPtrOffsetReg
	name: default_regs_mfi			name: default_regs_mfi
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: '$private_rsrc_reg'			scratchRSrcReg: '$private_rsrc_reg'

	body: \|			body: \|
	bb.0:			bb.0:
	S_ENDPGM 0			S_ENDPGM 0

	...			...

	---			---
	# ALL-LABEL: name: fake_stack_arginfo			# ALL-LABEL: name: fake_stack_arginfo

	# FULL: argumentInfo:			# FULL: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: flatScratchInit: { offset: 4 }			# FULL-NEXT: flatScratchInit: { offset: 4 }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# FULL-NEXT: workItemIDY: { reg: '$vgpr0', mask: 65280 }			# FULL-NEXT: workItemIDY: { reg: '$vgpr0', mask: 65280 }

	# SIMPLE: argumentInfo:			# SIMPLE: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: flatScratchInit: { offset: 4 }			# SIMPLE-NEXT: flatScratchInit: { offset: 4 }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# SIMPLE-NEXT: workItemIDY: { reg: '$vgpr0', mask: 65280 }			# SIMPLE-NEXT: workItemIDY: { reg: '$vgpr0', mask: 65280 }
	name: fake_stack_arginfo			name: fake_stack_arginfo
	machineFunctionInfo:			machineFunctionInfo:
	argumentInfo:			argumentInfo:
	flatScratchInit: { offset: 4 }			flatScratchInit: { offset: 4 }
	workItemIDY: { reg: '$vgpr0' , mask: 0xff00 }			workItemIDY: { reg: '$vgpr0' , mask: 0xff00 }

	body: \|			body: \|
	Show All 30 Lines

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

This file was deleted.

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -stop-after finalize-isel -o %t.mir %s
	; RUN: llc -run-pass=none -verify-machineinstrs %t.mir -o - \| FileCheck %s

	; Test that SIMachineFunctionInfo can be round trip serialized through
	; MIR.

	@lds = addrspace(3) global [512 x float] undef, align 4

	; CHECK-LABEL: {{^}}name: kernel
	; CHECK: machineFunctionInfo:
	; CHECK-NEXT: explicitKernArgSize: 128
	; CHECK-NEXT: maxKernArgAlign: 64
	; CHECK-NEXT: ldsSize: 0
	; CHECK-NEXT: isEntryFunction: true
	; CHECK-NEXT: noSignedZerosFPMath: false
	; CHECK-NEXT: memoryBound: false
	; CHECK-NEXT: waveLimiter: false
	; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	; CHECK-NEXT: scratchWaveOffsetReg: '$sgpr101'
	; CHECK-NEXT: frameOffsetReg: '$sgpr101'
	; CHECK-NEXT: stackPtrOffsetReg: '$sgpr101'
	; CHECK-NEXT: argumentInfo:
	; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	; CHECK-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	; CHECK-NEXT: workGroupIDX: { reg: '$sgpr6' }
	; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	; CHECK-NEXT: workItemIDX: { reg: '$vgpr0' }
	; CHECK-NEXT: mode:
	; CHECK-NEXT: ieee: true
	; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: body:
	define amdgpu_kernel void @kernel(i32 %arg0, i64 %arg1, <16 x i32> %arg2) {
	%gep = getelementptr inbounds [512 x float], [512 x float] addrspace(3)* @lds, i32 0, i32 %arg0
	store float 0.0, float addrspace(3)* %gep, align 4
	ret void
	}

	; CHECK-LABEL: {{^}}name: ps_shader
	; CHECK: machineFunctionInfo:
	; CHECK-NEXT: explicitKernArgSize: 0
	; CHECK-NEXT: maxKernArgAlign: 1
	; CHECK-NEXT: ldsSize: 0
	; CHECK-NEXT: isEntryFunction: true
	; CHECK-NEXT: noSignedZerosFPMath: false
	; CHECK-NEXT: memoryBound: false
	; CHECK-NEXT: waveLimiter: false
	; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	; CHECK-NEXT: scratchWaveOffsetReg: '$sgpr101'
	; CHECK-NEXT: frameOffsetReg: '$sgpr101'
	; CHECK-NEXT: stackPtrOffsetReg: '$sgpr101'
	; CHECK-NEXT: argumentInfo:
	; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr3' }
	; CHECK-NEXT: implicitBufferPtr: { reg: '$sgpr0_sgpr1' }
	; CHECK-NEXT: mode:
	; CHECK-NEXT: ieee: false
	; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: body:
	define amdgpu_ps void @ps_shader(i32 %arg0, i32 inreg %arg1) {
	ret void
	}

	; CHECK-LABEL: {{^}}name: function
	; CHECK: machineFunctionInfo:
	; CHECK-NEXT: explicitKernArgSize: 0
	; CHECK-NEXT: maxKernArgAlign: 1
	; CHECK-NEXT: ldsSize: 0
	; CHECK-NEXT: isEntryFunction: false
	; CHECK-NEXT: noSignedZerosFPMath: false
	; CHECK-NEXT: memoryBound: false
	; CHECK-NEXT: waveLimiter: false
	; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	; CHECK-NEXT: scratchWaveOffsetReg: '$sgpr33'
	; CHECK-NEXT: frameOffsetReg: '$sgpr34'
	; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'
	; CHECK-NEXT: argumentInfo:
	; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	; CHECK-NEXT: mode:
	; CHECK-NEXT: ieee: true
	; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: body:
	define void @function() {
	ret void
	}

	; CHECK-LABEL: {{^}}name: function_nsz
	; CHECK: machineFunctionInfo:
	; CHECK-NEXT: explicitKernArgSize: 0
	; CHECK-NEXT: maxKernArgAlign: 1
	; CHECK-NEXT: ldsSize: 0
	; CHECK-NEXT: isEntryFunction: false
	; CHECK-NEXT: noSignedZerosFPMath: true
	; CHECK-NEXT: memoryBound: false
	; CHECK-NEXT: waveLimiter: false
	; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	; CHECK-NEXT: scratchWaveOffsetReg: '$sgpr33'
	; CHECK-NEXT: frameOffsetReg: '$sgpr34'
	; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'
	; CHECK-NEXT: argumentInfo:
	; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	; CHECK-NEXT: mode:
	; CHECK-NEXT: ieee: true
	; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: body:
	define void @function_nsz() #0 {
	ret void
	}

	; CHECK-LABEL: {{^}}name: function_dx10_clamp_off
	; CHECK: mode:
	; CHECK-NEXT: ieee: true
	; CHECK-NEXT: dx10-clamp: false
	; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true
	define void @function_dx10_clamp_off() #1 {
	ret void
	}

	; CHECK-LABEL: {{^}}name: function_ieee_off
	; CHECK: mode:
	; CHECK-NEXT: ieee: false
	; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true
	define void @function_ieee_off() #2 {
	ret void
	}

	; CHECK-LABEL: {{^}}name: function_ieee_off_dx10_clamp_off
	; CHECK: mode:
	; CHECK-NEXT: ieee: false
	; CHECK-NEXT: dx10-clamp: false
	; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true
	define void @function_ieee_off_dx10_clamp_off() #3 {
	ret void
	}

	; CHECK-LABEL: {{^}}name: high_address_bits
	; CHECK: machineFunctionInfo:
	; CHECK: highBitsOf32BitAddress: 4294934528
	define amdgpu_ps void @high_address_bits() #4 {
	ret void
	}

	attributes #0 = { "no-signed-zeros-fp-math" = "true" }
	attributes #1 = { "amdgpu-dx10-clamp" = "false" }
	attributes #2 = { "amdgpu-ieee" = "false" }
	attributes #3 = { "amdgpu-dx10-clamp" = "false" "amdgpu-ieee" = "false" }
	attributes #4 = { "amdgpu-32bit-address-high-bits"="0xffff8000" }

llvm/test/CodeGen/MIR/AMDGPU/mfi-parse-error-scratch-wave-offset-reg.mir

This file was deleted.

	# RUN: not llc -march=amdgcn -run-pass none -o /dev/null %s 2>&1 \| FileCheck %s
	# CHECK: :7:27: expected a named register
	# CHECK: scratchWaveOffsetReg: ''
	---
	name: empty_scratch_wave_offset_reg
	machineFunctionInfo:
	scratchWaveOffsetReg: ''
	body: \|
	bb.0:

	S_ENDPGM
	...

llvm/test/CodeGen/MIR/AMDGPU/mfi-scratch-wave-offset-reg-class.mir

This file was deleted.

	# RUN: not llc -march=amdgcn -run-pass none -o /dev/null %s 2>&1 \| FileCheck %s
	# CHECK: :8:33: incorrect register class for field
	# CHECK: scratchWaveOffsetReg: '$vgpr0'

	---
	name: wrong_reg_class_scratch_wave_offset_reg
	machineFunctionInfo:
	scratchWaveOffsetReg: '$vgpr0'
	body: \|
	bb.0:

	S_ENDPGM
	...

llvm/test/CodeGen/MIR/AMDGPU/parse-order-reserved-regs.mir

	# RUN: llc -march=amdgcn -run-pass=none -verify-machineinstrs -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -run-pass=none -verify-machineinstrs -o - %s \| FileCheck %s
	# RUN: llc -march=amdgcn -run-pass mir-canonicalizer -verify-machineinstrs -o - %s			# RUN: llc -march=amdgcn -run-pass mir-canonicalizer -verify-machineinstrs -o - %s

				# FIXME: Is this still testing anything?

	# Previously getReservedRegs was called before parsing			# Previously getReservedRegs was called before parsing
	# machineFunctionInfo, but the AMDGPU implementation depends on			# machineFunctionInfo, but the AMDGPU implementation depends on
	# setting register fields to reserve there. $sgpr50 would then not be			# setting register fields to reserve there. $sgpr50 would then not be
	# reserved, resulting in a verifier error from an undefined register.			# reserved, resulting in a verifier error from an undefined register.

	---			---
	# CHECK: machineFunctionInfo:			# CHECK: machineFunctionInfo:
	# CHECK: isEntryFunction: true			# CHECK: isEntryFunction: true
	# CHECK: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			# CHECK: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	# CHECK: scratchWaveOffsetReg: '$sgpr50'			# CHECK: renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	# CHECK: frameOffsetReg: '$sgpr50'
	# CHECK: renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr50, 4, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	name: reserve_correct_register			name: reserve_correct_register
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr50'			argumentInfo:
	frameOffsetReg: '$sgpr50'			privateSegmentWaveByteOffset: { reg: '$sgpr50' }
	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 4 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:
	renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr50, 4, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	S_ENDPGM 0			S_ENDPGM 0
	...			...

llvm/test/DebugInfo/AMDGPU/variable-locations.ll

	Show All 26 Lines
	; CHECK-NEXT: DW_AT_type			; CHECK-NEXT: DW_AT_type
	; CHECK-NEXT: DW_AT_external			; CHECK-NEXT: DW_AT_external
	; CHECK-NEXT: DW_AT_decl_file			; CHECK-NEXT: DW_AT_decl_file
	; CHECK-NEXT: DW_AT_decl_line			; CHECK-NEXT: DW_AT_decl_line
	; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_addr 0x0)			; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_addr 0x0)
	@GlobB = common addrspace(1) global i32 0, align 4, !dbg !6			@GlobB = common addrspace(1) global i32 0, align 4, !dbg !6

	; CHECK: {{.*}}DW_TAG_subprogram			; CHECK: {{.*}}DW_TAG_subprogram
	; CHECK: DW_AT_frame_base [DW_FORM_block1] (DW_OP_reg{{.*}} SGPR9)			; CHECK-NOT: DW_AT_frame_base

	define amdgpu_kernel void @kernel1(			define amdgpu_kernel void @kernel1(
	; CHECK: {{.*}}DW_TAG_formal_parameter			; CHECK: {{.*}}DW_TAG_formal_parameter
	; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_fbreg +4, DW_OP_lit1, DW_OP_swap, DW_OP_xderef)			; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_fbreg +4, DW_OP_lit1, DW_OP_swap, DW_OP_xderef)
	; CHECK-NEXT: DW_AT_name {{.*}}"ArgN"			; CHECK-NEXT: DW_AT_name {{.*}}"ArgN"
	i32 %ArgN,			i32 %ArgN,
	; CHECK: {{.*}}DW_TAG_formal_parameter			; CHECK: {{.*}}DW_TAG_formal_parameter
	; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_fbreg +8, DW_OP_lit1, DW_OP_swap, DW_OP_xderef)			; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_fbreg +8, DW_OP_lit1, DW_OP_swap, DW_OP_xderef)
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functionsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 248352

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

llvm/lib/Target/AMDGPU/SIFrameLowering.h

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.td

llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-private.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-local.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-private.mir

llvm/test/CodeGen/AMDGPU/addrspacecast.ll

llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

llvm/test/CodeGen/AMDGPU/array-ptr-calc-i32.ll

llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll

llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

llvm/test/CodeGen/AMDGPU/call-constant.ll

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/call-waitcnt.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

llvm/test/CodeGen/AMDGPU/captured-frame-index.ll

llvm/test/CodeGen/AMDGPU/cc-update.ll

llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll

llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/extload-private.ll

llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll

llvm/test/CodeGen/AMDGPU/fold-fi-mubuf.mir

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

llvm/test/CodeGen/AMDGPU/frame-lowering-entry-all-sgpr-used.mir

llvm/test/CodeGen/AMDGPU/frame-lowering-fp-adjusted.mir

llvm/test/CodeGen/AMDGPU/function-returns.ll

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props.ll

llvm/test/CodeGen/AMDGPU/idot8s.ll

llvm/test/CodeGen/AMDGPU/idot8u.ll

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll

llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll

llvm/test/CodeGen/AMDGPU/ipra.ll

llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll

llvm/test/CodeGen/AMDGPU/large-alloca-graphics.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.implicit.buffer.ptr.ll

llvm/test/CodeGen/AMDGPU/load-hi16.ll

llvm/test/CodeGen/AMDGPU/load-lo16.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-load.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-store.ll

llvm/test/CodeGen/AMDGPU/memory_clause.ll

llvm/test/CodeGen/AMDGPU/mesa3d.ll

llvm/test/CodeGen/AMDGPU/mir-print-dead-csr-fi.mir

llvm/test/CodeGen/AMDGPU/misched-killflags.mir

llvm/test/CodeGen/AMDGPU/mubuf-offset-private.ll

llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/pei-reg-scavenger-position.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-carry-out.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr.mir

llvm/test/CodeGen/AMDGPU/private-access-no-objects.ll

[WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
ClosedPublic