This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
4/5
AMDGPUUsage.rst
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUCallLowering.cpp
-
AMDGPUISelDAGToDAG.cpp
-
AMDGPUInstructionSelector.cpp
-
AMDGPUTargetMachine.cpp
-
MCTargetDesc/
-
AMDGPUInstPrinter.cpp
1/1
SIFoldOperands.cpp
-
SIFrameLowering.h
9/18
SIFrameLowering.cpp
-
SIISelLowering.cpp
1/2
SIMachineFunctionInfo.h
-
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.h
1/1
SIRegisterInfo.cpp
-
SIRegisterInfo.td
-
test/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
divergent-control-flow.ll
-
insertelement.ll
-
inst-select-load-local.mir
-
inst-select-load-private.mir
-
inst-select-store-local.mir
-
inst-select-store-private.mir
-
mul.ll
-
addrspacecast.ll
-
amdgpu.private-memory.ll
-
amdhsa-trap-num-sgprs.ll
-
array-ptr-calc-i32.ll
1/1
attr-amdgpu-num-sgpr.ll
-
byval-frame-setup.ll
-
call-argument-types.ll
-
call-constant.ll
-
call-preserved-registers.ll
-
call-waitcnt.ll
-
callee-special-input-sgprs-fixed-abi.ll
-
callee-special-input-sgprs.ll
-
callee-special-input-vgprs.ll
-
captured-frame-index.ll
-
cc-update.ll
-
cgp-addressing-modes.ll
-
chain-hi-to-lo.ll
-
collapse-endcf.ll
-
control-flow-fastregalloc.ll
-
cross-block-use-is-not-abi-copy.ll
-
extload-private.ll
-
fast-unaligned-load-store.private.ll
-
fold-fi-mubuf.mir
-
frame-index-elimination.ll
-
frame-lowering-entry-all-sgpr-used.mir
-
frame-lowering-fp-adjusted.mir
-
function-returns.ll
-
hsa-metadata-kernel-code-props-v3.ll
-
hsa-metadata-kernel-code-props.ll
-
idot8s.ll
-
idot8u.ll
-
indirect-addressing-term.ll
-
indirect-call.ll
-
insert_vector_elt.ll
-
ipra.ll
-
large-alloca-compute.ll
-
large-alloca-graphics.ll
-
llvm.amdgcn.implicit.buffer.ptr.ll
-
load-hi16.ll
-
load-lo16.ll
-
memory-legalizer-load.ll
-
memory-legalizer-store.ll
-
memory_clause.ll
-
mesa3d.ll
-
mir-print-dead-csr-fi.mir
-
misched-killflags.mir
-
mubuf-offset-private.ll
-
optimize-exec-masking-pre-ra.mir
-
partial-sgpr-to-vgpr-spills.ll
-
pei-reg-scavenger-position.mir
-
pei-scavenge-sgpr-carry-out.mir
-
pei-scavenge-sgpr-gfx9.mir
-
pei-scavenge-sgpr.mir
-
private-access-no-objects.ll
-
private-element-size.ll
-
rename-independent-subregs-mac-operands.mir
-
sched-assert-dead-def-subreg-use-other-subreg.mir
-
sched-handleMoveUp-subreg-def-across-subreg-def.mir
-
scratch-buffer.ll
2/3
scratch-simple.ll
-
sgpr-spill-wrong-stack-id.mir
-
shl_add_ptr.ll
-
si-spill-sgpr-stack.ll
-
sibling-call.ll
-
sp-too-many-input-sgprs.ll
-
spill-agpr.ll
-
spill-before-exec.mir
-
spill-empty-live-interval.mir
-
spill-m0.ll
1/1
spill-offset-calculation.ll
-
stack-pointer-offset-relative-frameindex.ll
-
stack-realign-kernel.ll
-
stack-realign.ll
-
stack-slot-color-sgpr-vgpr-spills.mir
-
store-hi16.ll
-
subreg-split-live-in-error.mir
-
subvector-test.mir
-
vgpr-spill-emergency-stack-slot.ll
-
virtregrewrite-undef-identity-copy.mir
-
wqm.ll
-
wwm-reserved.ll
-
MIR/AMDGPU/
-
AMDGPU/
-
machine-function-info-no-ir.mir
-
machine-function-info.ll
-
mfi-parse-error-scratch-wave-offset-reg.mir
-
mfi-scratch-wave-offset-reg-class.mir
-
parse-order-reserved-regs.mir
-
DebugInfo/AMDGPU/
-
AMDGPU/
-
variable-locations.ll

Differential D75138

[WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
ClosedPublic

Authored by scott.linder on Feb 25 2020, 1:34 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
cdevadas
kzhuravl
b-sumner
RamNalamothu
mareko

Commits

rG60b1967c3933: [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Summary

[AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions

Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This frees up the preloaded scratch wave
offset register after the entry function prologue and removes the
scratch wave offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

scott.linder created this revision.Feb 25 2020, 1:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2020, 1:34 PM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 9 others. · View Herald Transcript

Harbormaster completed remote builds in B47237: Diff 246557.Feb 25 2020, 1:34 PM

scott.linder added reviewers: arsenm, rampitec, cdevadas, kzhuravl, b-sumner.Feb 25 2020, 1:35 PM

I'm having trouble working out the best way forward on this patch, with the core issue relating to the fact that we no longer need anything equivalent to a frame pointer in the entry function when there is no stack usage. This is complicated by the fact that hasFP is broken in some of the places it is called, including reservePrivateMemoryRegs. I'm not sure I completely understand where the best place to handle this is, but without addressing it I can't avoid gratuitously initializing the SP and/or FP in many cases, including a trivial kernel with no body.

I'm also not sure if my ISA for initializing the SRSRC is optimal and wanted to get feedback. I do think that in at least some cases we will need to save a DWORD out of the SRSRC while updating it, and in those cases I'm not certain scavenging one is infallible (see the cc-update-scavenge-fail.ll test case). Is there a better approach here?

I assume this is missing a lot of test updates?

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
555–556	Do we actually need these bits? I'm fairly confident these are always 0 in the HSA resource descriptor (or at least are a known constant we can just reproduce later)
559	I think just 0xffff0000 would be clearer here
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
339	These should be switched to Register at some point
llvm/test/CodeGen/AMDGPU/cc-update-scavenge-fail.ll
5 ↗	(On Diff #246557)	I would move this to the first line, and check the error message to make sure it fails for the right reason

In D75138#1892192, @scott.linder wrote:

I'm having trouble working out the best way forward on this patch, with the core issue relating to the fact that we no longer need anything equivalent to a frame pointer in the entry function when there is no stack usage. This is complicated by the fact that hasFP is broken in some of the places it is called, including reservePrivateMemoryRegs. I'm not sure I completely understand where the best place to handle this is, but without addressing it I can't avoid gratuitously initializing the SP and/or FP in many cases, including a trivial kernel with no body.

Why is this a problem exactly? I only vaguely remember what kind of problems this would cause. hasFP has always been broken depending on what time it's called, so in some places we do have to guess if it's needed

arsenm added inline comments.Feb 25 2020, 2:13 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
555–556	According to this it's hardcoded: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/core/runtime/amd_aql_queue.cpp#L1015 We just need to worry about SWIZZLE_ENABLE being set to 1. This is the high bit, so all it can do is trigger a carry on the second add. So I think that means you can get away with just doing the add, and then using s_bitset1_b32 to ensure it wasn't carried away

arsenm added inline comments.Feb 25 2020, 2:17 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
555–556	Actually, I don't think any add that fits in the 48-bit address space should ever touch the high bits (although I usually manage to be wrong about known bits optimizations with adds)

arsenm added inline comments.Feb 25 2020, 3:12 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
555–556	I think this means it's OK to just not worry about the high bits: https://rise4fun.com/Alive/i24

arsenm added inline comments.Feb 25 2020, 3:19 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
555–556	As long as we know bit 48 is 0, this seems fine. As this is hardcoded in the driver, this is probably OK https://rise4fun.com/Alive/KmH

RamNalamothu added a subscriber: RamNalamothu.Feb 25 2020, 7:02 PM

Yes, there are a lot of test updates and likely more new tests needed, but I just posted some tests that exercise the bits I'm currently stuck on for now.

I will try to articulate the issue with hasFP better tomorrow morning, but currently we are making the decision about whether to have a distinct FP (i.e. S34) before we actually know if we use the stack. If we have a call, but no stack use early, and then later we need to reference the stack we end up in a situation where at PEI time we are updating the same register both for the ABI SP and for the entry function FP, which obviously isn't right.

The right thing seems to be to not have any stack or frame pointer at all, but I am not sure how to implement that and wanted to ask for some help estimating how reasonable that would be.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
555–556	That make sense to me, and this would simplify things a lot. I don't quite understand if we need to ensure [48:62] are 0, though? If the addc carries into bit 48 is that an issue? I.e. https://rise4fun.com/Alive/qsv At the very least, it seems like we can avoid the need to save anything and just mask in a constant, but if it is possible to avoid that too it removes a couple additional instructions from nearly every kernel prologue.

scott.linder edited reviewers, added: RamNalamothu; removed: ramana-nvr.Feb 25 2020, 7:25 PM

I'm going to go ahead with trying to eliminate the need for an FP completely in entry functions and then update the review with a more complete set of test updates. I'm sure the issue I was having with defining and using hasFP consistently between ISel and PEI could be worked around, but putting that effort into eliminating the FP entirely in entry functions seems more productive.

Update/add tests and eliminate use of FP in entry functions

Herald added subscribers: arphaman, qcolombet, MatzeB. · View Herald TranscriptMar 4 2020, 4:03 PM

scott.linder added a parent revision: D75092: [AMDGPU][NFC] Refactor emitEntryFunctionPrologue.Mar 4 2020, 4:05 PM

scott.linder added a child revision: D75657: [WIP][AMDGPU] Move frame pointer from s34 to s33.

scott.linder retitled this revision from [WIP][AMDGPU] Eliminate the ScratchWaveOffset register from the calling convention to [WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions.Mar 4 2020, 4:23 PM

scott.linder edited the summary of this revision. (Show Details)

arsenm added inline comments.Mar 4 2020, 4:30 PM

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
626–627	Should demorgan this
llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
290–292	This should not need to inspect the original IR. Why can't this just read it directly from MFI? They should be accounted there already?
293	This will be inaccurate for any struct type, this should have been computed during lowering that knows the type split

There are still a reasonable amount of FIXME/TODO in this patch, and I left some additional comments on each to highlight them and ask for feedback on them. I am not entirely comfortable with the way I went about implementing the special-casing for having no FP in the entry function. I would prefer not having all of the isEntryFunction checks everywhere, but I'm not sure how else to represent it?

I also would rather break this patch up more, but I don't think doing so will make it easier to understand or reduce the size of the test diffs. The only pieces I could break off naturally were some NFC changes in https://reviews.llvm.org/D75092 and switching to s33 for FP in https://reviews.llvm.org/D75657

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
290	@arsenm @nhaehnle I don't think I understand how `inreg` currently works relative to "preloaded" SGPRs; is/should `inreg` be recorded somewhere in the machine function info so this isn't necessary?
300	Similar question here, should there be a change in `SITargetLowering` so the preloaded count is correct?
555–556	I went the route of just always doing the 64-bit add of the scratch wave offset into the SRsrc rather than saving anything or using known constants for some of the bits. From some other discussion this should always be correct.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
339	I haven't gotten around to this yet, but I'll do this in another NFC patch.
llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll
8	Can anyone help me understand what we are trying to test here? It seems likely the amount of live SGPRs and the amount of available SGPRs needs to be adjusted to have this test continue to be meaningful, but in trying to correct it I realized I wasn't sure what it was testing in the first place.
llvm/test/CodeGen/AMDGPU/scratch-simple.ll
143	@arsenm @nhaehnle Similar question as above wrt. how `inreg` should work. Is the `%swo` argument in these expected to actually be allowed to coincide with the scratch wave offset?
llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll
52	Is it OK for us to fail here? This is a consequence of not having a frame pointer in entry functions and not being able to e.g. restart RA after we realize we really need it in this case.

Harbormaster failed remote builds in B48132: Diff 248352!Mar 4 2020, 4:54 PM

arsenm added inline comments.Mar 9 2020, 1:09 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
290	Not directly. There shouldn't be any repeating of the calling convention logic here. Either the number of SGPR arguments should be recorded, or it should be inferred from the machine code. It might be correct to just count the number of SGPR in the function live-in list. I think live in registers can be deleted from the list if they are proven to be unused, so this might be fragile. Finding the highest live in SGPR number may also work.

Support FP in entry functions by reverting most of the changes needed
before PEI in the previous patches. Now the entry function always
allocates S32 for the SP, and optionally allocates S34 as the FP.

There are still a couple tests to be updated, but they are just due
to RA noise.

scott.linder edited the summary of this revision. (Show Details)Mar 10 2020, 4:53 PM

Harbormaster completed remote builds in B48759: Diff 249521.Mar 10 2020, 5:33 PM

LGTM with nits

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
362–365	Braces
382	s/unsigned/Register
383	Ditto
542	Braces
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
99	s/NoRegister/Register()

This revision is now accepted and ready to land.Mar 10 2020, 6:46 PM

I think commit comment "The ABI stack pointer register remains unswizzled, but is now wave-relative instead of dispatch-relative." shuld chage to "The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative." since for the HSAABI the scratch base is the queue base and not per dispatch. The PALABI may use per dispatch scratch allocation.

t-tye added inline comments.Mar 10 2020, 7:59 PM

llvm/docs/AMDGPUUsage.rst
8626–8627	Should the manner that the kernel prolog sets the scratch V# be specified? The compiler requests that the scratch V# and wave scratch offset be passed in using the kernel descriptor (reference the section), The wave scratch offset is added to the queue base address in the scratch V#and moved to SGPR0-3. Also specify how the kernel must set the FLAT_SCRATCH. The compiler requests that the flat scratch and wave scratch offset be passed in using the kernel descriptor (reference the section), The wave scratch offset is added to the flat scratch base and moved to FLAT_SCRATCH. Should setup up of M0 also be defined here. For GFX6-??? it is set to the LDS size, otherwise it is set to ???. Any other setup that hs to be done in the kernel prolog?
8637–8639	"This can be done without having to perform register allocation again, which is necessary as register allocation may introduce spills." Suggest moving this to a separate bullet and reword to make clear why this approach is done: "- Note: this approach of using a tentative scratch SRD and shifting the register numbers if used, avoids having to perform register allocation a second time if the tentative SRD is eliminated. This is more efficient and avoids the problem that the second register allocation may perform spilling which will fail as here is no longer a scratch SRD." For consistency, should SRD be changed to V# to match the usage in the next section?
8670	"private address" -> "private address space address"
8709	Is this necessary to say since the following bullet states all SGPS except 4-31 which means SGPR0-3 aare preserved?

scott.linder edited the summary of this revision. (Show Details)Mar 11 2020, 11:04 AM

In D75138#1916158, @t-tye wrote:

I think commit comment "The ABI stack pointer register remains unswizzled, but is now wave-relative instead of dispatch-relative." shuld chage to "The ABI stack pointer register remains unswizzled, but is now wave-relative instead of queue-relative." since for the HSAABI the scratch base is the queue base and not per dispatch. The PALABI may use per dispatch scratch allocation.

I updated the commit message, but I didn't include mention of the possibility of the PALABI differing here. Is there a more generic way to describe the old behavior for every ABI the compiler supports? As far as the compiler is concerned it is only important that the SRSRC base + the scratch wave offset gets it to the base for the scratch allocation for the wave.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
290	In switching back to supporting an FP I no longer see the need for this manifest, but there may still be a need to update this in the future. I don't think my change is making this any more fragile so I'm leaving it as it was.

scott.linder added a parent revision: D76035: [AMDGPU][NFC] Refactor some uses of unsigned to Register.Mar 11 2020, 4:40 PM

Address feedback

scott.linder added inline comments.Mar 11 2020, 4:45 PM

llvm/docs/AMDGPUUsage.rst
8626–8627	I didn't notice originally that we have a section "Code Conventions > AMDHSA > Kernel Prolog" which already describes some of this. It seemed odd to put some of that here and some of that there, so I ended up trying to just move all the relevant bits to the Kernel Prolog section and reference it here. It ends up being a bit circular in that the Kernel Prolog section defers to the Calling Convention section for the definition of the ABI stack pointer, and the Calling Convention section defers to the Kernel Prolog section for the description of the properties of M0/FlatScratch/V# and how they are initialized. I think it is OK, but maybe you have some suggestions?

Harbormaster failed remote builds in B48909: Diff 249800!Mar 11 2020, 5:35 PM

scott.linder added a reviewer: mareko.Mar 16 2020, 9:28 AM

Finish updating remaining tests. Remove Kill from last use of scratch wave
offset in prologue, as it is used in at least some Mesa shaders.

arsenm accepted this revision.Mar 17 2020, 3:27 PM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/scratch-simple.ll
101	Can you add a comment elaborating on what this tests

scott.linder marked an inline comment as done.Mar 17 2020, 3:58 PM

scott.linder added inline comments.

llvm/test/CodeGen/AMDGPU/scratch-simple.ll
101	From discussion with @mareko my understanding is that Mesa GS and HS shaders have the preloaded scratch wave offset SGPR fixed at SGPR5, and the inreg implementation is used to reference it in the IR. So here, the shader snippet inserted after the SI_RETURN_TO_EPILOG wants to use the scratch wave offset, and the IR passes it along by padding out the inreg arguments until it gets to where the scratch wave offset is, and then using it in the return value. I'll add something to that effect in the test.

Harbormaster failed remote builds in B49513: Diff 250925!Mar 17 2020, 4:14 PM

Closed by commit rG60b1967c3933: [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions (authored by scott.linder). · Explain WhyMar 19 2020, 1:10 PM

This revision was automatically updated to reflect the committed changes.

foad mentioned this in D79073: [AMDGPU] For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer.Apr 29 2020, 3:57 AM

critson mentioned this in D79776: [AMDGPU] Allow use of StackPtrOffsetReg when building spills.May 12 2020, 4:49 AM

critson mentioned this in rGa065a01bf715: [AMDGPU] Allow use of StackPtrOffsetReg when building spills.May 15 2020, 8:05 PM

Revision Contents

Path

Size

llvm/

docs/

AMDGPUUsage.rst

218 lines

lib/

Target/

AMDGPU/

AMDGPUCallLowering.cpp

2 lines

AMDGPUISelDAGToDAG.cpp

18 lines

AMDGPUInstructionSelector.cpp

41 lines

AMDGPUTargetMachine.cpp

6 lines

MCTargetDesc/

AMDGPUInstPrinter.cpp

1 line

17 lines

18 lines

262 lines

93 lines

SIMachineFunctionInfo.h

22 lines

SIMachineFunctionInfo.cpp

4 lines

SIRegisterInfo.h

5 lines

SIRegisterInfo.cpp

133 lines

SIRegisterInfo.td

3 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

divergent-control-flow.ll

2 lines

insertelement.ll

4 lines

inst-select-load-local.mir

278 lines

inst-select-load-private.mir

111 lines

inst-select-store-local.mir

224 lines

inst-select-store-private.mir

383 lines

mul.ll

20 lines

addrspacecast.ll

4 lines

amdgpu.private-memory.ll

20 lines

amdhsa-trap-num-sgprs.ll

4 lines

array-ptr-calc-i32.ll

4 lines

attr-amdgpu-num-sgpr.ll

10 lines

byval-frame-setup.ll

350 lines

call-argument-types.ll

74 lines

call-constant.ll

6 lines

call-preserved-registers.ll

31 lines

call-waitcnt.ll

37 lines

callee-special-input-sgprs-fixed-abi.ll

3 lines

callee-special-input-sgprs.ll

40 lines

callee-special-input-vgprs.ll

34 lines

captured-frame-index.ll

36 lines

cc-update.ll

422 lines

cgp-addressing-modes.ll

12 lines

chain-hi-to-lo.ll

28 lines

collapse-endcf.ll

2 lines

control-flow-fastregalloc.ll

64 lines

cross-block-use-is-not-abi-copy.ll

14 lines

extload-private.ll

8 lines

fast-unaligned-load-store.private.ll

77 lines

fold-fi-mubuf.mir

197 lines

frame-index-elimination.ll

63 lines

frame-lowering-entry-all-sgpr-used.mir

1 line

frame-lowering-fp-adjusted.mir

3 lines

function-returns.ll

202 lines

hsa-metadata-kernel-code-props-v3.ll

8 lines

hsa-metadata-kernel-code-props.ll

6 lines

idot8s.ll

2277 lines

idot8u.ll

2572 lines

indirect-addressing-term.ll

104 lines

indirect-call.ll

14 lines

insert_vector_elt.ll

48 lines

ipra.ll

2 lines

large-alloca-compute.ll

4 lines

large-alloca-graphics.ll

42 lines

llvm.amdgcn.implicit.buffer.ptr.ll

4 lines

load-hi16.ll

20 lines

load-lo16.ll

36 lines

memory-legalizer-load.ll

8 lines

memory-legalizer-store.ll

8 lines

memory_clause.ll

93 lines

mesa3d.ll

2 lines

mir-print-dead-csr-fi.mir

1 line

misched-killflags.mir

1 line

mubuf-offset-private.ll

38 lines

optimize-exec-masking-pre-ra.mir

1 line

partial-sgpr-to-vgpr-spills.ll

341 lines

pei-reg-scavenger-position.mir

14 lines

pei-scavenge-sgpr-carry-out.mir

57 lines

pei-scavenge-sgpr-gfx9.mir

5 lines

pei-scavenge-sgpr.mir

3 lines

private-access-no-objects.ll

14 lines

private-element-size.ll

224 lines

rename-independent-subregs-mac-operands.mir

2 lines

sched-assert-dead-def-subreg-use-other-subreg.mir

1 line

sched-handleMoveUp-subreg-def-across-subreg-def.mir

1 line

scratch-buffer.ll

14 lines

scratch-simple.ll

49 lines

sgpr-spill-wrong-stack-id.mir

25 lines

shl_add_ptr.ll

12 lines

si-spill-sgpr-stack.ll

3 lines

sibling-call.ll

2 lines

sp-too-many-input-sgprs.ll

spill-agpr.ll

16 lines

spill-before-exec.mir

11 lines

spill-empty-live-interval.mir

2 lines

spill-m0.ll

4 lines

spill-offset-calculation.ll

45 lines

stack-pointer-offset-relative-frameindex.ll

21 lines

stack-realign-kernel.ll

36 lines

stack-realign.ll

42 lines

stack-slot-color-sgpr-vgpr-spills.mir

7 lines

store-hi16.ll

28 lines

subreg-split-live-in-error.mir

1 line

subvector-test.mir

1 line

vgpr-spill-emergency-stack-slot.ll

4 lines

virtregrewrite-undef-identity-copy.mir

1 line

wqm.ll

4 lines

wwm-reserved.ll

8 lines

MIR/

AMDGPU/

machine-function-info-no-ir.mir

16 lines

machine-function-info.ll

14 lines

mfi-parse-error-scratch-wave-offset-reg.mir

mfi-scratch-wave-offset-reg-class.mir

parse-order-reserved-regs.mir

2 lines

DebugInfo/

AMDGPU/

variable-locations.ll

2 lines

Diff 251459

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	Generic
GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and		GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and
local apertures), that are outside the range of addressable global memory, to		local apertures), that are outside the range of addressable global memory, to
map from a flat address to a private or local address.		map from a flat address to a private or local address.

FLAT instructions can take a flat address and access global, private		FLAT instructions can take a flat address and access global, private
(scratch), and group (LDS) memory depending on if the address is within one		(scratch), and group (LDS) memory depending on if the address is within one
of the aperture ranges. Flat access to scratch requires hardware aperture		of the aperture ranges. Flat access to scratch requires hardware aperture
setup and setup in the kernel prologue (see		setup and setup in the kernel prologue (see
:ref:`amdgpu-amdhsa-flat-scratch`). Flat access to LDS requires hardware		:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`). Flat access to LDS requires
aperture setup and M0 (GFX7-GFX8) register setup (see		hardware aperture setup and M0 (GFX7-GFX8) register setup (see
:ref:`amdgpu-amdhsa-m0`).		:ref:`amdgpu-amdhsa-kernel-prolog-m0`).

To convert between a private or group address space address (termed a segment		To convert between a private or group address space address (termed a segment
address) and a flat address the base address of the corresponding aperture		address) and a flat address the base address of the corresponding aperture
can be used. For GFX7-GFX8 these are available in the		can be used. For GFX7-GFX8 these are available in the
:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with		:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with
Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For		Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For
GFX9-GFX10 the aperture base addresses are directly available as inline		GFX9-GFX10 the aperture base addresses are directly available as inline
constant registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``.		constant registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``.
▲ Show 20 Lines • Show All 5,582 Lines • ▼ Show 20 Lines	then Scratch Wavefront Offset 1 32-bit byte offset from base
_segment_wavefront_offset) executing the kernel		_segment_wavefront_offset) executing the kernel
dispatch. Must be used as an		dispatch. Must be used as an
offset with Private		offset with Private
segment address when using		segment address when using
Scratch Segment Buffer. It		Scratch Segment Buffer. It
must be used to set up FLAT		must be used to set up FLAT
SCRATCH for flat addressing		SCRATCH for flat addressing
(see		(see
:ref:`amdgpu-amdhsa-flat-scratch`).		:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`).
========== ========================== ====== ==============================		========== ========================== ====== ==============================

The order of the VGPR registers is defined, but the compiler can specify which		The order of the VGPR registers is defined, but the compiler can specify which
ones are actually setup in the kernel descriptor using the ``enable_vgpr*`` bit		ones are actually setup in the kernel descriptor using the ``enable_vgpr*`` bit
fields (see :ref:`amdgpu-amdhsa-kernel-descriptor`). The register numbers used		fields (see :ref:`amdgpu-amdhsa-kernel-descriptor`). The register numbers used
for enabled registers are dense starting at VGPR0: the first enabled register is		for enabled registers are dense starting at VGPR0: the first enabled register is
VGPR0, the next enabled register is VGPR1 etc.; disabled registers do not have a		VGPR0, the next enabled register is VGPR1 etc.; disabled registers do not have a
VGPR number.		VGPR number.
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
* MTYPE set to support memory coherence that matches the runtime (such as CC for		* MTYPE set to support memory coherence that matches the runtime (such as CC for
APU and NC for dGPU).		APU and NC for dGPU).

.. _amdgpu-amdhsa-kernel-prolog:		.. _amdgpu-amdhsa-kernel-prolog:

Kernel Prolog		Kernel Prolog
~~~~~~~~~~~~~		~~~~~~~~~~~~~

.. _amdgpu-amdhsa-m0:		The compiler performs initialization in the kernel prologue depending on the
		target and information about things like stack usage in the kernel and called
		functions. Some of this initialization requires the compiler to request certain
		User and System SGPRs be present in the
		:ref:`amdgpu-amdhsa-initial-kernel-execution-state` via the
		:ref:`amdgpu-amdhsa-kernel-descriptor`.

		.. _amdgpu-amdhsa-kernel-prolog-cfi:

		CFI
		+++

		1. The CFI return address is undefined.
		2. The CFI CFA is defined using an expression which evaluates to a memory
		location description for the private segment address ``0``.

		.. _amdgpu-amdhsa-kernel-prolog-m0:

M0		M0
++		++

GFX6-GFX8		GFX6-GFX8
The M0 register must be initialized with a value at least the total LDS size		The M0 register must be initialized with a value at least the total LDS size
if the kernel may access LDS via DS or flat operations. Total LDS size is		if the kernel may access LDS via DS or flat operations. Total LDS size is
available in dispatch packet. For M0, it is also possible to use maximum		available in dispatch packet. For M0, it is also possible to use maximum
possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for		possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for
GFX7-GFX8).		GFX7-GFX8).
GFX9-GFX10		GFX9-GFX10
The M0 register is not used for range checking LDS accesses and so does not		The M0 register is not used for range checking LDS accesses and so does not
need to be initialized in the prolog.		need to be initialized in the prolog.

.. _amdgpu-amdhsa-flat-scratch:		.. _amdgpu-amdhsa-kernel-prolog-stack-pointer:

		Stack Pointer
		+++++++++++++

		If the kernel has function calls it must set up the ABI stack pointer described
		in :ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions` by
		setting SGPR32 to the the unswizzled scratch offset of the address past the
		last local allocation.

		.. _amdgpu-amdhsa-kernel-prolog-frame-pointer:

		Frame Pointer
		+++++++++++++

		If the kernel needs a frame pointer for the reasons defined in
		``SIFrameLowering`` then SGPR34 is used and is always set to ``0`` in the
		kernel prolog. If a frame pointer is not required then all uses of the frame
		pointer are replaced with immediate ``0`` offsets.

		.. _amdgpu-amdhsa-kernel-prolog-flat-scratch:

Flat Scratch		Flat Scratch
++++++++++++		++++++++++++

If the kernel may use flat operations to access scratch memory, the prolog code		If the kernel or any function it calls may use flat operations to access
must set up FLAT_SCRATCH register pair (FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which		scratch memory, the prolog code must set up the FLAT_SCRATCH register pair
are in SGPRn-4/SGPRn-3). Initialization uses Flat Scratch Init and Scratch		(FLAT_SCRATCH_LO/FLAT_SCRATCH_HI which are in SGPRn-4/SGPRn-3). Initialization
Wavefront Offset SGPR registers (see		uses Flat Scratch Init and Scratch Wavefront Offset SGPR registers (see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`):		:ref:`amdgpu-amdhsa-initial-kernel-execution-state`):

GFX6		GFX6
Flat scratch is not supported.		Flat scratch is not supported.

GFX7-GFX8		GFX7-GFX8

1. The low word of Flat Scratch Init is 32-bit byte offset from		1. The low word of Flat Scratch Init is 32-bit byte offset from
Show All 14 Lines

GFX9-GFX10		GFX9-GFX10
The Flat Scratch Init is the 64-bit address of the base of scratch backing		The Flat Scratch Init is the 64-bit address of the base of scratch backing
memory being managed by SPI for the queue executing the kernel dispatch. The		memory being managed by SPI for the queue executing the kernel dispatch. The
prolog must add the value of Scratch Wavefront Offset and moved to the		prolog must add the value of Scratch Wavefront Offset and moved to the
FLAT_SCRATCH pair for use as the flat scratch base in flat memory		FLAT_SCRATCH pair for use as the flat scratch base in flat memory
instructions.		instructions.

		.. _amdgpu-amdhsa-kernel-prolog-private-segment-buffer:

		Private Segment Buffer
		++++++++++++++++++++++

		A set of four SGPRs beginning at a four-aligned SGPR index are always selected
		to serve as the scratch V# for the kernel as follows:

		- If it is know during instruction selection that there is stack usage,
		SGPR0-3 is reserved for use as the scratch V#. Stack usage is assumed if
		optimisations are disabled (``-O0``), if stack objects already exist (for
		locals, etc.), or if there are any function calls.

		- Otherwise, four high numbered SGPRs beginning at a four-aligned SGPR index
		are reserved for the tentative scratch V#. These will be used if it is
		determined that spilling is needed.

		- If no use is made of the tentative scratch V#, then it is unreserved
		and the register count is determined ignoring it.
		- If use is made of the tenatative scratch V#, then its register numbers
		are shifted to the first four-aligned SGPR index after the highest one
		allocated by the register allocator, and all uses are updated. The
		register count includes them in the shifted location.
		- In either case, if the processor has the SGPR allocation bug, the
		tentative allocation is not shifted or unreserved in order to ensure
		the register count is higher to workaround the bug.

		.. note::

		This approach of using a tentative scratch V# and shifting the register
		numbers if used avoids having to perform register allocation a second
		time if the tentative V# is eliminated. This is more efficient and
		avoids the problem that the second register allocation may perform
		spilling which will fail as there is no longer a scratch V#.

		When the kernel prolog code is being emitted it is known whether the scratch V#
		described above is actually used. If it is, the prolog code must set it up by
		copying the Private Segment Buffer to the scratch V# registers and then adding
		the Private Segment Wavefront Offset to the queue base address in the V#. The
		result is a V# with a base address pointing to the beginning of the wavefront
		scratch backing memory.

		The Private Segment Buffer is always requested, but the Private Segment
		Wavefront Offset is only requested if it is used (see
		:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).

.. _amdgpu-amdhsa-memory-model:		.. _amdgpu-amdhsa-memory-model:

Memory Model		Memory Model
~~~~~~~~~~~~		~~~~~~~~~~~~

This section describes the mapping of LLVM memory model onto AMDGPU machine code		This section describes the mapping of LLVM memory model onto AMDGPU machine code
(see :ref:`memmodel`).		(see :ref:`memmodel`).

▲ Show 20 Lines • Show All 2,424 Lines • ▼ Show 20 Lines
.. note::		.. note::

This section is currently incomplete and has inakkuracies. It is WIP that will		This section is currently incomplete and has inakkuracies. It is WIP that will
be updated as information is determined.		be updated as information is determined.

See :ref:`amdgpu-dwarf-address-space-mapping` for information on swizzled		See :ref:`amdgpu-dwarf-address-space-mapping` for information on swizzled
addresses. Unswizzled addresses are normal linear addresses.		addresses. Unswizzled addresses are normal linear addresses.

		.. _amdgpu-amdhsa-function-call-convention-kernel-functions:

Kernel Functions		Kernel Functions
++++++++++++++++		++++++++++++++++

This section describes the call convention ABI for the outer kernel function.		This section describes the call convention ABI for the outer kernel function.

See :ref:`amdgpu-amdhsa-initial-kernel-execution-state` for the kernel call		See :ref:`amdgpu-amdhsa-initial-kernel-execution-state` for the kernel call
convention.		convention.

The following is not part of the AMDGPU kernel calling convention but describes		The following is not part of the AMDGPU kernel calling convention but describes
how the AMDGPU implements function calls:		how the AMDGPU implements function calls:

1. Clang decides the kernarg layout to match the *HSA Programmer's Language		1. Clang decides the kernarg layout to match the *HSA Programmer's Language
Reference* [HSA]_.		Reference* [HSA]_.

- All structs are passed directly.		- All structs are passed directly.
- Lambda values are passed TBA.		- Lambda values are passed TBA.

.. TODO::		.. TODO::

- Does this really follow HSA rules? Or are structs >16 bytes passed		- Does this really follow HSA rules? Or are structs >16 bytes passed
by-value struct?		by-value struct?
- What is ABI for lambda values?		- What is ABI for lambda values?

2. The CFI return address is undefined.		4. The kernel performs certain setup in its prolog, as described in
3. If the kernel contains no calls then:		:ref:`amdgpu-amdhsa-kernel-prolog`.

- If using the ``amdhsa`` OS ABI (see :ref:`amdgpu-os-table`), and know
during ISel that there is stack usage SGPR0-3 is reserved for use as the
scratch SRD and SGPR33 reserved for the wave scratch offset. Stack usage
is assumed if ``-O0``, if already aware of stack objects for locals, etc.,
or if there are any function calls.
- Otherwise, five high numbered SGPRs are reserved for the tentative scratch
SRD and wave scratch offset. These will be used if determine need to do
spilling.

- If no use is made of the tentative scratch SRD or wave scratch offset,
then they are unreserved and the register count is determined ignoring
them.
- If use is made of the tenatative scratch SRD or wave scratch offset,
then the register numbers used are shifted to be after the highest one
allocated by the register allocator, and all uses updated. The register
count will include them in the shifted location. Since register
allocation may introduce spills, this shifting allows them to be
eliminated without having to perform register allocation again.
- In either case, if the processor has the SGPR allocation bug, the
tentative allocation is not shifted or unreserved inorder to ensure the
register count is higher to workaround the bug.

4. If the kernel contains function calls:

- SP is set to the wave scratch offset.

- Since SP is an unswizzled address relative to the queue scratch base, an
wave scratch offset is an unswizzle offset, this means that if SP is
used to access swizzled scratch memory, it will access the private
segment address 0.

.. note::		.. _amdgpu-amdhsa-function-call-convention-non-kernel-functions:
		t-tyeUnsubmitted Not Done Reply Inline Actions Should the manner that the kernel prolog sets the scratch V# be specified? The compiler requests that the scratch V# and wave scratch offset be passed in using the kernel descriptor (reference the section), The wave scratch offset is added to the queue base address in the scratch V#and moved to SGPR0-3. Also specify how the kernel must set the FLAT_SCRATCH. The compiler requests that the flat scratch and wave scratch offset be passed in using the kernel descriptor (reference the section), The wave scratch offset is added to the flat scratch base and moved to FLAT_SCRATCH. Should setup up of M0 also be defined here. For GFX6-??? it is set to the LDS size, otherwise it is set to ???. Any other setup that hs to be done in the kernel prolog? t-tye: Should the manner that the kernel prolog sets the scratch V# be specified? The compiler…
		scott.linderAuthorUnsubmitted Done Reply Inline Actions I didn't notice originally that we have a section "Code Conventions > AMDHSA > Kernel Prolog" which already describes some of this. It seemed odd to put some of that here and some of that there, so I ended up trying to just move all the relevant bits to the Kernel Prolog section and reference it here. It ends up being a bit circular in that the Kernel Prolog section defers to the Calling Convention section for the definition of the ABI stack pointer, and the Calling Convention section defers to the Kernel Prolog section for the description of the properties of M0/FlatScratch/V# and how they are initialized. I think it is OK, but maybe you have some suggestions? scott.linder: I didn't notice originally that we have a section "Code Conventions > AMDHSA > Kernel Prolog"…

This is planned to be changed to be the unswizzled base address of the
wavefront scratch backing memory.

Non-Kernel Functions		Non-Kernel Functions
++++++++++++++++++++		++++++++++++++++++++

This section describes the call convention ABI for functions other than the		This section describes the call convention ABI for functions other than the
outer kernel function.		outer kernel function.

If a kernel has function calls then scratch is always allocated and used for the		If a kernel has function calls then scratch is always allocated and used for
call stack which grows from low address to high address using the swizzled		the call stack which grows from low address to high address using the swizzled
scratch address space.		scratch address space.

On entry to a function:		On entry to a function:
		t-tyeUnsubmitted Done Reply Inline Actions "This can be done without having to perform register allocation again, which is necessary as register allocation may introduce spills." Suggest moving this to a separate bullet and reword to make clear why this approach is done: "- Note: this approach of using a tentative scratch SRD and shifting the register numbers if used, avoids having to perform register allocation a second time if the tentative SRD is eliminated. This is more efficient and avoids the problem that the second register allocation may perform spilling which will fail as here is no longer a scratch SRD." For consistency, should SRD be changed to V# to match the usage in the next section? t-tye: "This can be done without having to perform register allocation again, which is necessary as…

1. SGPR0-3 contain a V# with the following properties:		1. SGPR0-3 contain a V# with the following properties (see
		:ref:`amdgpu-amdhsa-kernel-prolog-private-segment-buffer`):
* Base address of the queue scratch backing memory.

.. note::

This is planned to be changed to be the unswizzled base address of the
wavefront scratch backing memory.

		* Base address pointing to the beginning of the wavefront scratch backing
		memory.
* Swizzled with dword element size and stride of wavefront size elements.		* Swizzled with dword element size and stride of wavefront size elements.

2. The FLAT_SCRATCH register pair is setup. See		2. The FLAT_SCRATCH register pair is setup. See
:ref:`amdgpu-amdhsa-flat-scratch`.		:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`.
3. GFX6-8: M0 register set to the size of LDS in bytes.		3. GFX6-8: M0 register set to the size of LDS in bytes. See
		:ref:`amdgpu-amdhsa-kernel-prolog-m0`.
4. The EXEC register is set to the lanes active on entry to the function.		4. The EXEC register is set to the lanes active on entry to the function.
5. MODE register: TBD		5. MODE register: TBD
6. VGPR0-31 and SGPR4-29 are used to pass function input arguments as described		6. VGPR0-31 and SGPR4-29 are used to pass function input arguments as described
below.		below.
7. SGPR30-31 return address (RA). The code address that the function must		7. SGPR30-31 return address (RA). The code address that the function must
return to when it completes. The value is undefined if the function is *no		return to when it completes. The value is undefined if the function is *no
return*.		return*.
8. SGPR32 is used for the stack pointer (SP). It is an unswizzled		8. SGPR32 is used for the stack pointer (SP). It is an unswizzled scratch
scratch offset relative to the beginning of the queue scratch backing		offset relative to the beginning of the wavefront scratch backing memory.
memory.

The unswizzled SP can be used with buffer instructions as an unswizzled SGPR		The unswizzled SP can be used with buffer instructions as an unswizzled SGPR
offset with the scratch V# in SGPR0-3 to access the stack in a swizzled		offset with the scratch V# in SGPR0-3 to access the stack in a swizzled
manner.		manner.

		The unswizzled SP value can be converted into the swizzled SP value by:

		\| swizzled SP = unswizzled SP / wavefront size

		This may be used to obtain the private address space address of stack
		t-tyeUnsubmitted Done Reply Inline Actions "private address" -> "private address space address" t-tye: "private address" -> "private address space address"
		objects and to convert this address to a flat address by adding the flat
		scratch aperture base address.

The swizzled SP value is always 4 bytes aligned for the ``r600``		The swizzled SP value is always 4 bytes aligned for the ``r600``
architecture and 16 byte aligned for the ``amdgcn`` architecture.		architecture and 16 byte aligned for the ``amdgcn`` architecture.

.. note::		.. note::

The ``amdgcn`` value is selected to avoid dynamic stack alignment for the		The ``amdgcn`` value is selected to avoid dynamic stack alignment for the
OpenCL language which has the largest base type defined as 16 bytes.		OpenCL language which has the largest base type defined as 16 bytes.

On entry, the swizzled SP value is the address of the first function		On entry, the swizzled SP value is the address of the first function
argument passed on the stack. Other stack passed arguments are positive		argument passed on the stack. Other stack passed arguments are positive
offsets from the entry swizzled SP value.		offsets from the entry swizzled SP value.

The function may use positive offsets beyond the last stack passed argument		The function may use positive offsets beyond the last stack passed argument
for stack allocated local variables and register spill slots. If necessary		for stack allocated local variables and register spill slots. If necessary
the function may align these to greater alignment than 16 bytes. After these		the function may align these to greater alignment than 16 bytes. After these
the function may dynamically allocate space for such things as runtime sized		the function may dynamically allocate space for such things as runtime sized
``alloca`` local allocations.		``alloca`` local allocations.

If the function calls another function, it will place any stack allocated		If the function calls another function, it will place any stack allocated
arguments after the last local allocation and adjust SGPR32 to the address		arguments after the last local allocation and adjust SGPR32 to the address
after the last local allocation.		after the last local allocation.

.. note::		9. All other registers are unspecified.
		10. Any necessary ``waitcnt`` has been performed to ensure memory is available
The SP value is planned to be changed to be the unswizzled offset relative
to the wavefront scratch backing memory.

9. SGPR33 wavefront scratch base offset. The unswizzled offset from the queue
scratch backing memory base to the base of the wavefront scratch backing
memory.

It is used to convert the unswizzled SP value to swizzled address in the
private address space by:

\| private address = (unswizzled SP - wavefront scratch base offset) /
wavefront size

This may be used to obtain the private address of stack objects and to
convert these address to a flat address by adding the flat scratch aperture
base address.

.. note::

This is planned to be eliminated when SP is changed to be the unswizzled
offset relative to the wavefront scratch backing memory. The the
conversion simplifies to:

\| private address = unswizzled SP / wavefront size

10. All other registers are unspecified.
11. Any necessary ``waitcnt`` has been performed to ensure memory is available
to the function.		to the function.

On exit from a function:		On exit from a function:

1. VGPR0-31 and SGPR4-29 are used to pass function result arguments as		1. VGPR0-31 and SGPR4-29 are used to pass function result arguments as
described below. Any registers used are considered clobbered registers,		described below. Any registers used are considered clobbered registers.
2. The following registers are preserved and have the same value as on entry:		2. The following registers are preserved and have the same value as on entry:

* FLAT_SCRATCH		* FLAT_SCRATCH
* EXEC		* EXEC
* GFX6-8: M0		* GFX6-8: M0
* All SGPR and VGPR registers except the clobbered registers of SGPR4-31 and		* All SGPR and VGPR registers except the clobbered registers of SGPR4-31 and
		t-tyeUnsubmitted Done Reply Inline Actions Is this necessary to say since the following bullet states all SGPS except 4-31 which means SGPR0-3 aare preserved? t-tye: Is this necessary to say since the following bullet states all SGPS except 4-31 which means…
VGPR0-31.		VGPR0-31.

For the AMDGPU backend, an inter-procedural register allocation (IPRA)		For the AMDGPU backend, an inter-procedural register allocation (IPRA)
optimization may mark some of clobbered SGPR4-31 and VGPR0-31 registers as		optimization may mark some of clobbered SGPR4-31 and VGPR0-31 registers as
preserved if it can be determined that the called function does not change		preserved if it can be determined that the called function does not change
their value.		their value.

2. The PC is set to the RA provided on entry.		2. The PC is set to the RA provided on entry.
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	..TODO::
in registers will not be decomposed and will be passed as a non-decomposed		in registers will not be decomposed and will be passed as a non-decomposed
stack value?		stack value?

The following is not part of the AMDGPU function calling convention but		The following is not part of the AMDGPU function calling convention but
describes how the AMDGPU implements function calls:		describes how the AMDGPU implements function calls:

1. SGPR34 is used as a frame pointer (FP) if necessary. Like the SP it is an		1. SGPR34 is used as a frame pointer (FP) if necessary. Like the SP it is an
unswizzled scratch address. It is only needed if runtime sized ``alloca``		unswizzled scratch address. It is only needed if runtime sized ``alloca``
are used, or for the reasons defined in ``SiFrameLowering``.		are used, or for the reasons defined in ``SIFrameLowering``.
2. Runtime stack alignment is not currently supported.		2. Runtime stack alignment is not currently supported.

.. TODO::		.. TODO::

- If runtime stack alignment is supported then will an extra argument		- If runtime stack alignment is supported then will an extra argument
pointer register be used?		pointer register be used?

2. Allocating SGPR arguments on the stack are not supported.		2. Allocating SGPR arguments on the stack are not supported.

3. No CFI is currently generated. See :ref:`amdgpu-call-frame-information`.		3. No CFI is currently generated. See :ref:`amdgpu-call-frame-information`.

..note::		..note::

Before CFI is generated, the call convention will be changed so that SP is
an unswizzled address relative to the wave scratch base.

CFI will be generated that defines the CFA as the unswizzled address		CFI will be generated that defines the CFA as the unswizzled address
relative to the wave scratch base in the unswizzled private address space		relative to the wave scratch base in the unswizzled private address space
of the lowest address stack allocated local variable.		of the lowest address stack allocated local variable.

``DW_AT_frame_base`` will be defined as the swizelled address in the		``DW_AT_frame_base`` will be defined as the swizzled address in the
swizzled private address space by dividing the CFA by the wavefront size		swizzled private address space by dividing the CFA by the wavefront size
(since CFA is always at least dword aligned which matches the scratch		(since CFA is always at least dword aligned which matches the scratch
swizzle element size).		swizzle element size).

If no dynamic stack alignment was performed, the stack allocated arguments		If no dynamic stack alignment was performed, the stack allocated arguments
are accessed as negative offsets relative to ``DW_AT_frame_base``, and the		are accessed as negative offsets relative to ``DW_AT_frame_base``, and the
local variables and register spill slots are accessed as positive offsets		local variables and register spill slots are accessed as positive offsets
relative to ``DW_AT_frame_base``.		relative to ``DW_AT_frame_base``.
▲ Show 20 Lines • Show All 1,052 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

Show First 20 Lines • Show All 794 Lines • ▼ Show 20 Lines	if (!IsEntryFunc) {
TLI.allocateSpecialInputVGPRs(CCInfo, MF, TRI, Info);		TLI.allocateSpecialInputVGPRs(CCInfo, MF, TRI, Info);
}		}

// Start adding system SGPRs.		// Start adding system SGPRs.
if (IsEntryFunc) {		if (IsEntryFunc) {
TLI.allocateSystemSGPRs(CCInfo, MF, *Info, CC, IsShader);		TLI.allocateSystemSGPRs(CCInfo, MF, *Info, CC, IsShader);
} else {		} else {
CCInfo.AllocateReg(Info->getScratchRSrcReg());		CCInfo.AllocateReg(Info->getScratchRSrcReg());
CCInfo.AllocateReg(Info->getScratchWaveOffsetReg());
CCInfo.AllocateReg(Info->getFrameOffsetReg());
TLI.allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);		TLI.allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);
}		}

// Move back to the end of the basic block.		// Move back to the end of the basic block.
B.setMBB(MBB);		B.setMBB(MBB);

return true;		return true;
}		}

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,468 Lines • ▼ Show 20 Lines
}		}

static bool isStackPtrRelative(const MachinePointerInfo &PtrInfo) {		static bool isStackPtrRelative(const MachinePointerInfo &PtrInfo) {
auto PSV = PtrInfo.V.dyn_cast<const PseudoSourceValue *>();		auto PSV = PtrInfo.V.dyn_cast<const PseudoSourceValue *>();
return PSV && PSV->isStack();		return PSV && PSV->isStack();
}		}

std::pair<SDValue, SDValue> AMDGPUDAGToDAGISel::foldFrameIndex(SDValue N) const {		std::pair<SDValue, SDValue> AMDGPUDAGToDAGISel::foldFrameIndex(SDValue N) const {
		SDLoc DL(N);
const MachineFunction &MF = CurDAG->getMachineFunction();		const MachineFunction &MF = CurDAG->getMachineFunction();
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();

if (auto FI = dyn_cast<FrameIndexSDNode>(N)) {		if (auto FI = dyn_cast<FrameIndexSDNode>(N)) {
SDValue TFI = CurDAG->getTargetFrameIndex(FI->getIndex(),		SDValue TFI = CurDAG->getTargetFrameIndex(FI->getIndex(),
FI->getValueType(0));		FI->getValueType(0));

// If we can resolve this to a frame index access, this will be relative to		// If we can resolve this to a frame index access, this will be relative to
// either the stack or frame pointer SGPR.		// either the stack or frame pointer SGPR.
return std::make_pair(		return std::make_pair(
TFI, CurDAG->getRegister(Info->getStackPtrOffsetReg(), MVT::i32));		TFI, CurDAG->getRegister(Info->getStackPtrOffsetReg(), MVT::i32));
}		}

// If we don't know this private access is a local stack object, it needs to		// If we don't know this private access is a local stack object, it needs to
// be relative to the entry point's scratch wave offset register.		// be relative to the entry point's scratch wave offset.
return std::make_pair(N, CurDAG->getRegister(Info->getScratchWaveOffsetReg(),		return std::make_pair(N, CurDAG->getTargetConstant(0, DL, MVT::i32));
MVT::i32));
}		}

bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffen(SDNode *Parent,		bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffen(SDNode *Parent,
SDValue Addr, SDValue &Rsrc,		SDValue Addr, SDValue &Rsrc,
SDValue &VAddr, SDValue &SOffset,		SDValue &VAddr, SDValue &SOffset,
SDValue &ImmOffset) const {		SDValue &ImmOffset) const {

SDLoc DL(Addr);		SDLoc DL(Addr);
MachineFunction &MF = CurDAG->getMachineFunction();		MachineFunction &MF = CurDAG->getMachineFunction();
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();

Rsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);		Rsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);

if (ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr)) {		if (ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr)) {
unsigned Imm = CAddr->getZExtValue();		unsigned Imm = CAddr->getZExtValue();

SDValue HighBits = CurDAG->getTargetConstant(Imm & ~4095, DL, MVT::i32);		SDValue HighBits = CurDAG->getTargetConstant(Imm & ~4095, DL, MVT::i32);
MachineSDNode *MovHighBits = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,		MachineSDNode *MovHighBits = CurDAG->getMachineNode(AMDGPU::V_MOV_B32_e32,
DL, MVT::i32, HighBits);		DL, MVT::i32, HighBits);
VAddr = SDValue(MovHighBits, 0);		VAddr = SDValue(MovHighBits, 0);

// In a call sequence, stores to the argument stack area are relative to the		// In a call sequence, stores to the argument stack area are relative to the
// stack pointer.		// stack pointer.
const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();		const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();
unsigned SOffsetReg = isStackPtrRelative(PtrInfo) ?
Info->getStackPtrOffsetReg() : Info->getScratchWaveOffsetReg();

SOffset = CurDAG->getRegister(SOffsetReg, MVT::i32);		SOffset = isStackPtrRelative(PtrInfo)
		? CurDAG->getRegister(Info->getStackPtrOffsetReg(), MVT::i32)
		: CurDAG->getTargetConstant(0, DL, MVT::i32);
ImmOffset = CurDAG->getTargetConstant(Imm & 4095, DL, MVT::i16);		ImmOffset = CurDAG->getTargetConstant(Imm & 4095, DL, MVT::i16);
return true;		return true;
}		}

if (CurDAG->isBaseWithConstantOffset(Addr)) {		if (CurDAG->isBaseWithConstantOffset(Addr)) {
// (add n0, c1)		// (add n0, c1)

SDValue N0 = Addr.getOperand(0);		SDValue N0 = Addr.getOperand(0);
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffset(SDNode *Parent,

SDLoc DL(Addr);		SDLoc DL(Addr);
MachineFunction &MF = CurDAG->getMachineFunction();		MachineFunction &MF = CurDAG->getMachineFunction();
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();

SRsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);		SRsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);

const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();		const MachinePointerInfo &PtrInfo = cast<MemSDNode>(Parent)->getPointerInfo();
unsigned SOffsetReg = isStackPtrRelative(PtrInfo) ?
Info->getStackPtrOffsetReg() : Info->getScratchWaveOffsetReg();

// FIXME: Get from MachinePointerInfo? We should only be using the frame		// FIXME: Get from MachinePointerInfo? We should only be using the frame
// offset if we know this is in a call sequence.		// offset if we know this is in a call sequence.
SOffset = CurDAG->getRegister(SOffsetReg, MVT::i32);		SOffset = isStackPtrRelative(PtrInfo)
		? CurDAG->getRegister(Info->getStackPtrOffsetReg(), MVT::i32)
		: CurDAG->getTargetConstant(0, DL, MVT::i32);

Offset = CurDAG->getTargetConstant(CAddr->getZExtValue(), DL, MVT::i16);		Offset = CurDAG->getTargetConstant(CAddr->getZExtValue(), DL, MVT::i16);
return true;		return true;
}		}

bool AMDGPUDAGToDAGISel::SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc,		bool AMDGPUDAGToDAGISel::SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc,
SDValue &SOffset, SDValue &Offset,		SDValue &SOffset, SDValue &Offset,
SDValue &GLC, SDValue &SLC,		SDValue &GLC, SDValue &SLC,
▲ Show 20 Lines • Show All 1,304 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 2,682 Lines • ▼ Show 20 Lines	return {{[=](MachineInstrBuilder &MIB) { // rsrc
},		},
[=](MachineInstrBuilder &MIB) { // vaddr		[=](MachineInstrBuilder &MIB) { // vaddr
MIB.addReg(HighBits);		MIB.addReg(HighBits);
},		},
[=](MachineInstrBuilder &MIB) { // soffset		[=](MachineInstrBuilder &MIB) { // soffset
const MachineMemOperand MMO = MI->memoperands_begin();		const MachineMemOperand MMO = MI->memoperands_begin();
const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();		const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();

Register SOffsetReg = isStackPtrRelative(PtrInfo)		if (isStackPtrRelative(PtrInfo))
? Info->getStackPtrOffsetReg()		MIB.addReg(Info->getStackPtrOffsetReg());
: Info->getScratchWaveOffsetReg();		else
MIB.addReg(SOffsetReg);		MIB.addImm(0);
},		},
[=](MachineInstrBuilder &MIB) { // offset		[=](MachineInstrBuilder &MIB) { // offset
MIB.addImm(Offset & 4095);		MIB.addImm(Offset & 4095);
}}};		}}};
}		}

assert(Offset == 0);		assert(Offset == 0);

Show All 20 Lines	if (isBaseWithConstantOffset(Root, *MRI)) {
Offset = PossibleOffset;		Offset = PossibleOffset;
}		}
}		}
} else if (RootDef->getOpcode() == AMDGPU::G_FRAME_INDEX) {		} else if (RootDef->getOpcode() == AMDGPU::G_FRAME_INDEX) {
FI = RootDef->getOperand(1).getIndex();		FI = RootDef->getOperand(1).getIndex();
}		}
}		}

// If we don't know this private access is a local stack object, it needs to
// be relative to the entry point's scratch wave offset register.
// TODO: Should split large offsets that don't fit like above.
// TODO: Don't use scratch wave offset just because the offset didn't fit.
Register SOffset = FI.hasValue() ? Info->getStackPtrOffsetReg()
: Info->getScratchWaveOffsetReg();

return {{[=](MachineInstrBuilder &MIB) { // rsrc		return {{[=](MachineInstrBuilder &MIB) { // rsrc
MIB.addReg(Info->getScratchRSrcReg());		MIB.addReg(Info->getScratchRSrcReg());
},		},
[=](MachineInstrBuilder &MIB) { // vaddr		[=](MachineInstrBuilder &MIB) { // vaddr
if (FI.hasValue())		if (FI.hasValue())
MIB.addFrameIndex(FI.getValue());		MIB.addFrameIndex(FI.getValue());
else		else
MIB.addReg(VAddr);		MIB.addReg(VAddr);
},		},
[=](MachineInstrBuilder &MIB) { // soffset		[=](MachineInstrBuilder &MIB) { // soffset
MIB.addReg(SOffset);		// If we don't know this private access is a local stack object, it
		// needs to be relative to the entry point's scratch wave offset.
		// TODO: Should split large offsets that don't fit like above.
		// TODO: Don't use scratch wave offset just because the offset
		// didn't fit.
		if (!Info->isEntryFunction() && FI.hasValue())
		MIB.addReg(Info->getStackPtrOffsetReg());
		else
		MIB.addImm(0);
},		},
[=](MachineInstrBuilder &MIB) { // offset		[=](MachineInstrBuilder &MIB) { // offset
MIB.addImm(Offset);		MIB.addImm(Offset);
}}};		}}};
}		}

bool AMDGPUInstructionSelector::isDSOffsetLegal(Register Base,		bool AMDGPUInstructionSelector::isDSOffsetLegal(Register Base,
int64_t Offset,		int64_t Offset,
Show All 21 Lines	if (!mi_match(Root.getReg(), *MRI, m_ICst(Offset)) \|\|
!SIInstrInfo::isLegalMUBUFImmOffset(Offset))		!SIInstrInfo::isLegalMUBUFImmOffset(Offset))
return {};		return {};

const MachineFunction *MF = MBB->getParent();		const MachineFunction *MF = MBB->getParent();
const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();
const MachineMemOperand MMO = MI->memoperands_begin();		const MachineMemOperand MMO = MI->memoperands_begin();
const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();		const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();

Register SOffsetReg = isStackPtrRelative(PtrInfo)
? Info->getStackPtrOffsetReg()
: Info->getScratchWaveOffsetReg();
return {{		return {{
[=](MachineInstrBuilder &MIB) {		[=](MachineInstrBuilder &MIB) { // rsrc
MIB.addReg(Info->getScratchRSrcReg());		MIB.addReg(Info->getScratchRSrcReg());
}, // rsrc		},
[=](MachineInstrBuilder &MIB) { MIB.addReg(SOffsetReg); }, // soffset		[=](MachineInstrBuilder &MIB) { // soffset
		if (isStackPtrRelative(PtrInfo))
		MIB.addReg(Info->getStackPtrOffsetReg());
		else
		MIB.addImm(0);
		},
[=](MachineInstrBuilder &MIB) { MIB.addImm(Offset); } // offset		[=](MachineInstrBuilder &MIB) { MIB.addImm(Offset); } // offset
}};		}};
}		}

std::pair<Register, unsigned>		std::pair<Register, unsigned>
AMDGPUInstructionSelector::selectDS1Addr1OffsetImpl(MachineOperand &Root) const {		AMDGPUInstructionSelector::selectDS1Addr1OffsetImpl(MachineOperand &Root) const {
const MachineInstr *RootDef = MRI->getVRegDef(Root.getReg());		const MachineInstr *RootDef = MRI->getVRegDef(Root.getReg());
if (!RootDef)		if (!RootDef)
return std::make_pair(Root.getReg(), 0);		return std::make_pair(Root.getReg(), 0);
▲ Show 20 Lines • Show All 542 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 1,070 Lines • ▼ Show 20 Lines	Error = SMDiagnostic(*PFS.SM, SMLoc(), Buffer.getBufferIdentifier(), 1,
RegName.Value.size(), SourceMgr::DK_Error,		RegName.Value.size(), SourceMgr::DK_Error,
"incorrect register class for field", RegName.Value,		"incorrect register class for field", RegName.Value,
None, None);		None, None);
SourceRange = RegName.SourceRange;		SourceRange = RegName.SourceRange;
return true;		return true;
};		};

if (parseRegister(YamlMFI.ScratchRSrcReg, MFI->ScratchRSrcReg) \|\|		if (parseRegister(YamlMFI.ScratchRSrcReg, MFI->ScratchRSrcReg) \|\|
parseRegister(YamlMFI.ScratchWaveOffsetReg, MFI->ScratchWaveOffsetReg) \|\|
parseRegister(YamlMFI.FrameOffsetReg, MFI->FrameOffsetReg) \|\|		parseRegister(YamlMFI.FrameOffsetReg, MFI->FrameOffsetReg) \|\|
parseRegister(YamlMFI.StackPtrOffsetReg, MFI->StackPtrOffsetReg))		parseRegister(YamlMFI.StackPtrOffsetReg, MFI->StackPtrOffsetReg))
return true;		return true;

if (MFI->ScratchRSrcReg != AMDGPU::PRIVATE_RSRC_REG &&		if (MFI->ScratchRSrcReg != AMDGPU::PRIVATE_RSRC_REG &&
!AMDGPU::SGPR_128RegClass.contains(MFI->ScratchRSrcReg)) {		!AMDGPU::SGPR_128RegClass.contains(MFI->ScratchRSrcReg)) {
return diagnoseRegisterClass(YamlMFI.ScratchRSrcReg);		return diagnoseRegisterClass(YamlMFI.ScratchRSrcReg);
}		}

if (MFI->ScratchWaveOffsetReg != AMDGPU::SCRATCH_WAVE_OFFSET_REG &&
!AMDGPU::SGPR_32RegClass.contains(MFI->ScratchWaveOffsetReg)) {
return diagnoseRegisterClass(YamlMFI.ScratchWaveOffsetReg);
}

if (MFI->FrameOffsetReg != AMDGPU::FP_REG &&		if (MFI->FrameOffsetReg != AMDGPU::FP_REG &&
!AMDGPU::SGPR_32RegClass.contains(MFI->FrameOffsetReg)) {		!AMDGPU::SGPR_32RegClass.contains(MFI->FrameOffsetReg)) {
return diagnoseRegisterClass(YamlMFI.FrameOffsetReg);		return diagnoseRegisterClass(YamlMFI.FrameOffsetReg);
}		}

if (MFI->StackPtrOffsetReg != AMDGPU::SP_REG &&		if (MFI->StackPtrOffsetReg != AMDGPU::SP_REG &&
!AMDGPU::SGPR_32RegClass.contains(MFI->StackPtrOffsetReg)) {		!AMDGPU::SGPR_32RegClass.contains(MFI->StackPtrOffsetReg)) {
return diagnoseRegisterClass(YamlMFI.StackPtrOffsetReg);		return diagnoseRegisterClass(YamlMFI.StackPtrOffsetReg);
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

	Show First 20 Lines • Show All 302 Lines • ▼ Show 20 Lines
	}			}

	void AMDGPUInstPrinter::printRegOperand(unsigned RegNo, raw_ostream &O,			void AMDGPUInstPrinter::printRegOperand(unsigned RegNo, raw_ostream &O,
	const MCRegisterInfo &MRI) {			const MCRegisterInfo &MRI) {
	#if !defined(NDEBUG)			#if !defined(NDEBUG)
	switch (RegNo) {			switch (RegNo) {
	case AMDGPU::FP_REG:			case AMDGPU::FP_REG:
	case AMDGPU::SP_REG:			case AMDGPU::SP_REG:
	case AMDGPU::SCRATCH_WAVE_OFFSET_REG:
	case AMDGPU::PRIVATE_RSRC_REG:			case AMDGPU::PRIVATE_RSRC_REG:
	llvm_unreachable("pseudo-register should not ever be emitted");			llvm_unreachable("pseudo-register should not ever be emitted");
	case AMDGPU::SCC:			case AMDGPU::SCC:
	llvm_unreachable("pseudo scc should not ever be emitted");			llvm_unreachable("pseudo scc should not ever be emitted");
	default:			default:
	break;			break;
	}			}
	#endif			#endif
	▲ Show 20 Lines • Show All 1,251 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines	void SIFoldOperands::foldOperand(
}		}

if (tryToFoldACImm(TII, OpToFold, UseMI, UseOpIdx, FoldList))		if (tryToFoldACImm(TII, OpToFold, UseMI, UseOpIdx, FoldList))
return;		return;

if (frameIndexMayFold(TII, *UseMI, UseOpIdx, OpToFold)) {		if (frameIndexMayFold(TII, *UseMI, UseOpIdx, OpToFold)) {
// Sanity check that this is a stack access.		// Sanity check that this is a stack access.
// FIXME: Should probably use stack pseudos before frame lowering.		// FIXME: Should probably use stack pseudos before frame lowering.
MachineOperand SOff = TII->getNamedOperand(UseMI, AMDGPU::OpName::soffset);
if (!SOff->isReg() \|\| (SOff->getReg() != MFI->getScratchWaveOffsetReg() &&
SOff->getReg() != MFI->getStackPtrOffsetReg()))
return;

if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() !=		if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() !=
MFI->getScratchRSrcReg())		MFI->getScratchRSrcReg())
return;		return;

		// Ensure this is either relative to the current frame or the current wave.
		MachineOperand &SOff =
		TII->getNamedOperand(UseMI, AMDGPU::OpName::soffset);
		if ((!SOff.isReg() \|\| SOff.getReg() != MFI->getStackPtrOffsetReg()) &&
		(!SOff.isImm() \|\| SOff.getImm() != 0))
		return;

// A frame index will resolve to a positive constant, so it should always be		// A frame index will resolve to a positive constant, so it should always be
		arsenmUnsubmitted Done Reply Inline Actions Should demorgan this arsenm: Should demorgan this
// safe to fold the addressing mode, even pre-GFX9.		// safe to fold the addressing mode, even pre-GFX9.
UseMI->getOperand(UseOpIdx).ChangeToFrameIndex(OpToFold.getIndex());		UseMI->getOperand(UseOpIdx).ChangeToFrameIndex(OpToFold.getIndex());
SOff->setReg(MFI->getStackPtrOffsetReg());
		// If this is relative to the current wave, update it to be relative to the
		// current frame.
		if (SOff.isImm())
		SOff.ChangeToRegister(MFI->getStackPtrOffsetReg(), false);
return;		return;
}		}

bool FoldingImmLike =		bool FoldingImmLike =
OpToFold.isImm() \|\| OpToFold.isFI() \|\| OpToFold.isGlobal();		OpToFold.isImm() \|\| OpToFold.isFI() \|\| OpToFold.isGlobal();

if (FoldingImmLike && UseMI->isCopy()) {		if (FoldingImmLike && UseMI->isCopy()) {
Register DestReg = UseMI->getOperand(0).getReg();		Register DestReg = UseMI->getOperand(0).getReg();
▲ Show 20 Lines • Show All 911 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFrameLowering.h

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	private:			private:
	void emitEntryFunctionFlatScratchInit(MachineFunction &MF,			void emitEntryFunctionFlatScratchInit(MachineFunction &MF,
	MachineBasicBlock &MBB,			MachineBasicBlock &MBB,
	MachineBasicBlock::iterator I,			MachineBasicBlock::iterator I,
	const DebugLoc &DL,			const DebugLoc &DL,
	Register ScratchWaveOffsetReg) const;			Register ScratchWaveOffsetReg) const;

	Register getEntryFunctionReservedScratchRsrcReg(MachineFunction &MF) const;

	Register			Register
	getEntryFunctionReservedScratchWaveOffsetReg(MachineFunction &MF) const;			getEntryFunctionReservedScratchRsrcReg(MachineFunction &MF,
				Register ScratchWaveOffsetReg) const;

	void emitEntryFunctionScratchRsrcRegSetup(MachineFunction &MF,			void emitEntryFunctionScratchRsrcRegSetup(
	MachineBasicBlock &MBB,			MachineFunction &MF, MachineBasicBlock &MBB,
	MachineBasicBlock::iterator I,			MachineBasicBlock::iterator I, const DebugLoc &DL,
	const DebugLoc &DL,			Register PreloadedPrivateBufferReg, Register ScratchRsrcReg,
	Register PreloadedPrivateBufferReg,			Register ScratchWaveOffsetReg) const;
	Register ScratchRsrcReg) const;

	public:			public:
	bool hasFP(const MachineFunction &MF) const override;			bool hasFP(const MachineFunction &MF) const override;
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_LIB_TARGET_AMDGPU_SIFRAMELOWERING_H			#endif // LLVM_LIB_TARGET_AMDGPU_SIFRAMELOWERING_H

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show All 24 Lines


static ArrayRef<MCPhysReg> getAllSGPR128(const GCNSubtarget &ST,		static ArrayRef<MCPhysReg> getAllSGPR128(const GCNSubtarget &ST,
const MachineFunction &MF) {		const MachineFunction &MF) {
return makeArrayRef(AMDGPU::SGPR_128RegClass.begin(),		return makeArrayRef(AMDGPU::SGPR_128RegClass.begin(),
ST.getMaxNumSGPRs(MF) / 4);		ST.getMaxNumSGPRs(MF) / 4);
}		}

static ArrayRef<MCPhysReg> getAllSGPRs(const GCNSubtarget &ST,
const MachineFunction &MF) {
return makeArrayRef(AMDGPU::SGPR_32RegClass.begin(),
ST.getMaxNumSGPRs(MF));
}

// Find a scratch register that we can use at the start of the prologue to		// Find a scratch register that we can use at the start of the prologue to
// re-align the stack pointer. We avoid using callee-save registers since they		// re-align the stack pointer. We avoid using callee-save registers since they
// may appear to be free when this is called from canUseAsPrologue (during		// may appear to be free when this is called from canUseAsPrologue (during
// shrink wrapping), but then no longer be free when this is called from		// shrink wrapping), but then no longer be free when this is called from
// emitPrologue.		// emitPrologue.
//		//
// FIXME: This is a bit conservative, since in the above case we could use one		// FIXME: This is a bit conservative, since in the above case we could use one
// of the callee-save registers as a scratch temp to re-align the stack pointer,		// of the callee-save registers as a scratch temp to re-align the stack pointer,
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	void SIFrameLowering::emitEntryFunctionFlatScratchInit(
// Convert offset to 256-byte units.		// Convert offset to 256-byte units.
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_LSHR_B32), AMDGPU::FLAT_SCR_HI)		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_LSHR_B32), AMDGPU::FLAT_SCR_HI)
.addReg(FlatScrInitLo, RegState::Kill)		.addReg(FlatScrInitLo, RegState::Kill)
.addImm(8);		.addImm(8);
}		}

// Shift down registers reserved for the scratch RSRC.		// Shift down registers reserved for the scratch RSRC.
Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(		Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(
MachineFunction &MF) const {		MachineFunction &MF, Register ScratchWaveOffsetReg) const {

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();		const SIRegisterInfo *TRI = &TII->getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

assert(MFI->isEntryFunction());		assert(MFI->isEntryFunction());
Show All 12 Lines	Register SIFrameLowering::getEntryFunctionReservedScratchRsrcReg(
// which were actually used.		// which were actually used.
//		//
// FIXME: It might be safer to use a pseudoregister before replacement.		// FIXME: It might be safer to use a pseudoregister before replacement.

// FIXME: We should be able to eliminate unused input registers. We only		// FIXME: We should be able to eliminate unused input registers. We only
// cannot do this for the resources required for scratch access. For now we		// cannot do this for the resources required for scratch access. For now we
// skip over user SGPRs and may leave unused holes.		// skip over user SGPRs and may leave unused holes.

// We find the resource first because it has an alignment requirement.

unsigned NumPreloaded = (MFI->getNumPreloadedSGPRs() + 3) / 4;		unsigned NumPreloaded = (MFI->getNumPreloadedSGPRs() + 3) / 4;
ArrayRef<MCPhysReg> AllSGPR128s = getAllSGPR128(ST, MF);		ArrayRef<MCPhysReg> AllSGPR128s = getAllSGPR128(ST, MF);
		scott.linderAuthorUnsubmitted Done Reply Inline Actions @arsenm @nhaehnle I don't think I understand how `inreg` currently works relative to "preloaded" SGPRs; is/should `inreg` be recorded somewhere in the machine function info so this isn't necessary? scott.linder: @arsenm @nhaehnle I don't think I understand how `inreg` currently works relative to…
		arsenmUnsubmitted Not Done Reply Inline Actions Not directly. There shouldn't be any repeating of the calling convention logic here. Either the number of SGPR arguments should be recorded, or it should be inferred from the machine code. It might be correct to just count the number of SGPR in the function live-in list. I think live in registers can be deleted from the list if they are proven to be unused, so this might be fragile. Finding the highest live in SGPR number may also work. arsenm: Not directly. There shouldn't be any repeating of the calling convention logic here. Either the…
		scott.linderAuthorUnsubmitted Done Reply Inline Actions In switching back to supporting an FP I no longer see the need for this manifest, but there may still be a need to update this in the future. I don't think my change is making this any more fragile so I'm leaving it as it was. scott.linder: In switching back to supporting an FP I no longer see the need for this manifest, but there may…
AllSGPR128s = AllSGPR128s.slice(std::min(static_cast<unsigned>(AllSGPR128s.size()), NumPreloaded));		AllSGPR128s = AllSGPR128s.slice(std::min(static_cast<unsigned>(AllSGPR128s.size()), NumPreloaded));

		arsenmUnsubmitted Not Done Reply Inline Actions This should not need to inspect the original IR. Why can't this just read it directly from MFI? They should be accounted there already? arsenm: This should not need to inspect the original IR. Why can't this just read it directly from MFI?
// Skip the last N reserved elements because they should have already been		// Skip the last N reserved elements because they should have already been
		arsenmUnsubmitted Not Done Reply Inline Actions This will be inaccurate for any struct type, this should have been computed during lowering that knows the type split arsenm: This will be inaccurate for any struct type, this should have been computed during lowering…
// reserved for VCC etc.		// reserved for VCC etc.
for (MCPhysReg Reg : AllSGPR128s) {		for (MCPhysReg Reg : AllSGPR128s) {
// Pick the first unallocated one. Make sure we don't clobber the other		// Pick the first unallocated one. Make sure we don't clobber the other
// reserved input we needed.		// reserved input we needed.
if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg)) {		//
		// FIXME: The preloaded SGPR count is not accurate for shaders as the
		// scratch wave offset may be in a fixed SGPR or
		scott.linderAuthorUnsubmitted Done Reply Inline Actions Similar question here, should there be a change in `SITargetLowering` so the preloaded count is correct? scott.linder: Similar question here, should there be a change in `SITargetLowering` so the preloaded count is…
		// SITargetLowering::allocateSystemSGPRs may choose some free SGPR for the
		// scratch wave offset. We explicitly avoid the scratch wave offset to
		// account for this.
		if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg) &&
		!TRI->isSubRegisterEq(Reg, ScratchWaveOffsetReg)) {
MRI.replaceRegWith(ScratchRsrcReg, Reg);		MRI.replaceRegWith(ScratchRsrcReg, Reg);
MFI->setScratchRSrcReg(Reg);		MFI->setScratchRSrcReg(Reg);
return Reg;		return Reg;
}		}
}		}

return ScratchRsrcReg;		return ScratchRsrcReg;
}		}

// Shift down registers reserved for the scratch wave offset.
Register SIFrameLowering::getEntryFunctionReservedScratchWaveOffsetReg(
MachineFunction &MF) const {

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

assert(MFI->isEntryFunction());

Register ScratchWaveOffsetReg = MFI->getScratchWaveOffsetReg();

if (ScratchWaveOffsetReg == AMDGPU::NoRegister \|\|
(!MRI.isPhysRegUsed(ScratchWaveOffsetReg) && !hasFP(MF) &&
!MFI->hasFlatScratchInit())) {
assert(!hasFP(MF) && !MFI->hasFlatScratchInit());
return AMDGPU::NoRegister;
}

if (ST.hasSGPRInitBug() \|\|
ScratchWaveOffsetReg != TRI->reservedPrivateSegmentWaveByteOffsetReg(MF))
return ScratchWaveOffsetReg;

unsigned NumPreloaded = MFI->getNumPreloadedSGPRs();

ArrayRef<MCPhysReg> AllSGPRs = getAllSGPRs(ST, MF);
if (NumPreloaded > AllSGPRs.size())
return ScratchWaveOffsetReg;

AllSGPRs = AllSGPRs.slice(NumPreloaded);

// We need to drop register from the end of the list that we cannot use
// for the scratch wave offset.
// + 2 s102 and s103 do not exist on VI.
// + 2 for vcc
// + 2 for xnack_mask
// + 2 for flat_scratch
// + 4 for registers reserved for scratch resource register
// + 1 for register reserved for scratch wave offset. (By exluding this
// register from the list to consider, it means that when this
// register is being used for the scratch wave offset and there
// are no other free SGPRs, then the value will stay in this register.
// + 1 if stack pointer is used.
// ----
// 13 (+1)
unsigned ReservedRegCount = 13;

if (AllSGPRs.size() < ReservedRegCount)
return ScratchWaveOffsetReg;

for (MCPhysReg Reg : AllSGPRs.drop_back(ReservedRegCount)) {
// Pick the first unallocated SGPR. Be careful not to pick an alias of the
// scratch descriptor, since we haven’t added its uses yet.
if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg)) {
MRI.replaceRegWith(ScratchWaveOffsetReg, Reg);
if (MFI->getScratchWaveOffsetReg() == MFI->getStackPtrOffsetReg()) {
assert(!hasFP(MF));
MFI->setStackPtrOffsetReg(Reg);
}
MFI->setScratchWaveOffsetReg(Reg);
MFI->setFrameOffsetReg(Reg);
return Reg;
}
}

return ScratchWaveOffsetReg;
}

void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,		void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");		assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");

// FIXME: If we only have SGPR spills, we won't actually be using scratch		// FIXME: If we only have SGPR spills, we won't actually be using scratch
// memory since these spill to VGPRs. We should be cleaning up these unused		// memory since these spill to VGPRs. We should be cleaning up these unused
// SGPR spill frame indices somewhere.		// SGPR spill frame indices somewhere.

// FIXME: We still have implicit uses on SGPR spill instructions in case they		// FIXME: We still have implicit uses on SGPR spill instructions in case they
// need to spill to vector memory. It's likely that will not happen, but at		// need to spill to vector memory. It's likely that will not happen, but at
// this point it appears we need the setup. This part of the prolog should be		// this point it appears we need the setup. This part of the prolog should be
// emitted after frame indices are eliminated.		// emitted after frame indices are eliminated.

// FIXME: Remove all of the isPhysRegUsed checks		// FIXME: Remove all of the isPhysRegUsed checks

SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();

assert(MFI->isEntryFunction());		assert(MFI->isEntryFunction());

// We need to do the replacement of the private segment buffer and wave offset		Register ScratchWaveOffsetReg = MFI->getPreloadedReg(
// register even if there are no stack objects. There could be stores to undef		AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET);
// or a constant without an associated object.		// FIXME: Hack to not crash in situations which emitted an error.
		if (ScratchWaveOffsetReg == AMDGPU::NoRegister)
		return;

		// We need to do the replacement of the private segment buffer register even
		// if there are no stack objects. There could be stores to undef or a
		// constant without an associated object.
//		//
// These calls will return `AMDGPU::NoRegister` in cases where there are no		// This will return `AMDGPU::NoRegister` in cases where there are no actual
// actual uses of the respective registers.		// uses of the SRSRC.
Register ScratchRsrcReg = getEntryFunctionReservedScratchRsrcReg(MF);		Register ScratchRsrcReg =
Register ScratchWaveOffsetReg =		getEntryFunctionReservedScratchRsrcReg(MF, ScratchWaveOffsetReg);
getEntryFunctionReservedScratchWaveOffsetReg(MF);

// Make the selected registers live throughout the function.		// Make the selected register live throughout the function.
		if (ScratchRsrcReg != AMDGPU::NoRegister) {
for (MachineBasicBlock &OtherBB : MF) {		for (MachineBasicBlock &OtherBB : MF) {
if (&OtherBB == &MBB)		if (&OtherBB != &MBB) {
continue;

if (ScratchWaveOffsetReg != AMDGPU::NoRegister)
OtherBB.addLiveIn(ScratchWaveOffsetReg);

if (ScratchRsrcReg != AMDGPU::NoRegister)
OtherBB.addLiveIn(ScratchRsrcReg);		OtherBB.addLiveIn(ScratchRsrcReg);
}		}
		}
		}

// Now that we have fixed the reserved registers we need to locate the		// Now that we have fixed the reserved SRSRC we need to locate the
// (potentially) preloaded registers. We should always have a preloaded		// (potentially) preloaded SRSRC.
// scratch wave offset register, but we only have a preloaded scratch rsrc
// register for HSA.
Register PreloadedScratchWaveOffsetReg = MFI->getPreloadedReg(
AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET);
// FIXME: Hack to not crash in situations which emitted an error.
if (PreloadedScratchWaveOffsetReg == AMDGPU::NoRegister)
return;

// We added live-ins during argument lowering, but since they were not used
// they were deleted. We're adding the uses now, so add them back.
MRI.addLiveIn(PreloadedScratchWaveOffsetReg);
MBB.addLiveIn(PreloadedScratchWaveOffsetReg);

Register PreloadedScratchRsrcReg = AMDGPU::NoRegister;		Register PreloadedScratchRsrcReg = AMDGPU::NoRegister;
if (ST.isAmdHsaOrMesa(F)) {		if (ST.isAmdHsaOrMesa(F)) {
		arsenmUnsubmitted Done Reply Inline Actions Braces arsenm: Braces
PreloadedScratchRsrcReg =		PreloadedScratchRsrcReg =
MFI->getPreloadedReg(AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_BUFFER);		MFI->getPreloadedReg(AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_BUFFER);
if (ScratchRsrcReg != AMDGPU::NoRegister &&		if (ScratchRsrcReg != AMDGPU::NoRegister &&
PreloadedScratchRsrcReg != AMDGPU::NoRegister) {		PreloadedScratchRsrcReg != AMDGPU::NoRegister) {
		// We added live-ins during argument lowering, but since they were not
		// used they were deleted. We're adding the uses now, so add them back.
MRI.addLiveIn(PreloadedScratchRsrcReg);		MRI.addLiveIn(PreloadedScratchRsrcReg);
MBB.addLiveIn(PreloadedScratchRsrcReg);		MBB.addLiveIn(PreloadedScratchRsrcReg);
}		}
}		}

		// Debug location must be unknown since the first debug location is used to
		// determine the end of the prologue.
DebugLoc DL;		DebugLoc DL;
MachineBasicBlock::iterator I = MBB.begin();		MachineBasicBlock::iterator I = MBB.begin();

const bool HasFP = hasFP(MF);		if (MF.getFrameInfo().hasCalls()) {
		arsenmUnsubmitted Done Reply Inline Actions s/unsigned/Register arsenm: s/unsigned/Register
		Register SPReg = MFI->getStackPtrOffsetReg();
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
// If we are not HSA or we happened to reserved the original input registers,		assert(SPReg != AMDGPU::SP_REG);
// we don't need to copy to the reserved registers.		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), SPReg)
const bool CopyBuffer = ST.isAmdHsaOrMesa(F) &&		.addImm(MF.getFrameInfo().getStackSize() * ST.getWavefrontSize());
ScratchRsrcReg != AMDGPU::NoRegister &&
PreloadedScratchRsrcReg != AMDGPU::NoRegister &&
ScratchRsrcReg != PreloadedScratchRsrcReg;

// This needs to be careful of the copying order to avoid overwriting one of
// the input registers before it's been copied to it's final
// destination. Usually the offset should be copied first.
const bool CopyBufferFirst =
TRI->isSubRegisterEq(PreloadedScratchRsrcReg, ScratchWaveOffsetReg);

if (CopyBuffer && CopyBufferFirst) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), ScratchRsrcReg)
.addReg(PreloadedScratchRsrcReg, RegState::Kill);
}

if (ScratchWaveOffsetReg != AMDGPU::NoRegister) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), ScratchWaveOffsetReg)
.addReg(PreloadedScratchWaveOffsetReg, HasFP ? RegState::Kill : 0);
}		}

if (CopyBuffer && !CopyBufferFirst) {		if (hasFP(MF)) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), ScratchRsrcReg)		Register FPReg = MFI->getFrameOffsetReg();
.addReg(PreloadedScratchRsrcReg, RegState::Kill);		assert(FPReg != AMDGPU::FP_REG);
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), FPReg).addImm(0);
}		}

// FIXME: This should also implement the setup path for HSA.		if (MFI->hasFlatScratchInit() \|\| ScratchRsrcReg != AMDGPU::NoRegister) {
if (ScratchRsrcReg != AMDGPU::NoRegister) {		MRI.addLiveIn(ScratchWaveOffsetReg);
emitEntryFunctionScratchRsrcRegSetup(		MBB.addLiveIn(ScratchWaveOffsetReg);
MF, MBB, I, DL, PreloadedScratchRsrcReg, ScratchRsrcReg);
}		}

if (HasFP) {		if (MFI->hasFlatScratchInit()) {
const MachineFrameInfo &FrameInfo = MF.getFrameInfo();		emitEntryFunctionFlatScratchInit(MF, MBB, I, DL, ScratchWaveOffsetReg);
int64_t StackSize = FrameInfo.getStackSize();

Register SPReg = MFI->getStackPtrOffsetReg();
assert(SPReg != AMDGPU::SP_REG);

// On kernel entry, the private scratch wave offset is the SP value.
if (StackSize == 0) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), SPReg)
.addReg(MFI->getScratchWaveOffsetReg());
} else {
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADD_U32), SPReg)
.addReg(MFI->getScratchWaveOffsetReg())
.addImm(StackSize * ST.getWavefrontSize());
}
}		}

if (MFI->hasFlatScratchInit()) {		if (ScratchRsrcReg != AMDGPU::NoRegister) {
emitEntryFunctionFlatScratchInit(MF, MBB, I, DL,		emitEntryFunctionScratchRsrcRegSetup(MF, MBB, I, DL,
MFI->getScratchWaveOffsetReg());		PreloadedScratchRsrcReg,
		ScratchRsrcReg, ScratchWaveOffsetReg);
}		}
}		}

// Emit scratch RSRC setup code, assuming `ScratchRsrcReg != AMDGPU::NoRegister`		// Emit scratch RSRC setup code, assuming `ScratchRsrcReg != AMDGPU::NoReg`
void SIFrameLowering::emitEntryFunctionScratchRsrcRegSetup(		void SIFrameLowering::emitEntryFunctionScratchRsrcRegSetup(
MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I,		MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
const DebugLoc &DL, Register PreloadedScratchRsrcReg,		const DebugLoc &DL, Register PreloadedScratchRsrcReg,
Register ScratchRsrcReg) const {		Register ScratchRsrcReg, Register ScratchWaveOffsetReg) const {

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const SIRegisterInfo *TRI = &TII->getRegisterInfo();		const SIRegisterInfo *TRI = &TII->getRegisterInfo();
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
const Function &Fn = MF.getFunction();		const Function &Fn = MF.getFunction();

if (ST.isAmdPalOS()) {		if (ST.isAmdPalOS()) {
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (ST.isAmdPalOS()) {

BuildMI(MBB, I, DL, SMovB32, Rsrc2)		BuildMI(MBB, I, DL, SMovB32, Rsrc2)
.addImm(Rsrc23 & 0xffffffff)		.addImm(Rsrc23 & 0xffffffff)
.addReg(ScratchRsrcReg, RegState::ImplicitDefine);		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);

BuildMI(MBB, I, DL, SMovB32, Rsrc3)		BuildMI(MBB, I, DL, SMovB32, Rsrc3)
.addImm(Rsrc23 >> 32)		.addImm(Rsrc23 >> 32)
.addReg(ScratchRsrcReg, RegState::ImplicitDefine);		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		} else if (ST.isAmdHsaOrMesa(Fn)) {
		assert(PreloadedScratchRsrcReg != AMDGPU::NoRegister);

		if (ScratchRsrcReg != PreloadedScratchRsrcReg) {
		arsenmUnsubmitted Done Reply Inline Actions Braces arsenm: Braces
		BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), ScratchRsrcReg)
		.addReg(PreloadedScratchRsrcReg, RegState::Kill);
}		}
}		}

		// Add the scratch wave offset into the scratch RSRC.
		//
		// We only want to update the first 48 bits, which is the base address
		// pointer, without touching the adjacent 16 bits of flags. We know this add
		// cannot carry-out from bit 47, otherwise the scratch allocation would be
		// impossible to fit in the 48-bit global address space.
		//
		// TODO: Evaluate if it is better to just construct an SRD using the flat
		// scratch init and some constants rather than update the one we are passed.
		arsenmUnsubmitted Not Done Reply Inline Actions Do we actually need these bits? I'm fairly confident these are always 0 in the HSA resource descriptor (or at least are a known constant we can just reproduce later) arsenm: Do we actually need these bits? I'm fairly confident these are always 0 in the HSA resource…
		arsenmUnsubmitted Not Done Reply Inline Actions According to this it's hardcoded: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/core/runtime/amd_aql_queue.cpp#L1015 We just need to worry about SWIZZLE_ENABLE being set to 1. This is the high bit, so all it can do is trigger a carry on the second add. So I think that means you can get away with just doing the add, and then using s_bitset1_b32 to ensure it wasn't carried away arsenm: According to this it's hardcoded: https://github.com/RadeonOpenCompute/ROCR…
		arsenmUnsubmitted Not Done Reply Inline Actions Actually, I don't think any add that fits in the 48-bit address space should ever touch the high bits (although I usually manage to be wrong about known bits optimizations with adds) arsenm: Actually, I don't think any add that fits in the 48-bit address space should ever touch the…
		arsenmUnsubmitted Not Done Reply Inline Actions I think this means it's OK to just not worry about the high bits: https://rise4fun.com/Alive/i24 arsenm: I think this means it's OK to just not worry about the high bits: https://rise4fun.
		arsenmUnsubmitted Not Done Reply Inline Actions As long as we know bit 48 is 0, this seems fine. As this is hardcoded in the driver, this is probably OK https://rise4fun.com/Alive/KmH arsenm: As long as we know bit 48 is 0, this seems fine. As this is hardcoded in the driver, this is…
		scott.linderAuthorUnsubmitted Done Reply Inline Actions That make sense to me, and this would simplify things a lot. I don't quite understand if we need to ensure [48:62] are 0, though? If the addc carries into bit 48 is that an issue? I.e. https://rise4fun.com/Alive/qsv At the very least, it seems like we can avoid the need to save anything and just mask in a constant, but if it is possible to avoid that too it removes a couple additional instructions from nearly every kernel prologue. scott.linder: That make sense to me, and this would simplify things a lot. I don't quite understand if we…
		scott.linderAuthorUnsubmitted Done Reply Inline Actions I went the route of just always doing the 64-bit add of the scratch wave offset into the SRsrc rather than saving anything or using known constants for some of the bits. From some other discussion this should always be correct. scott.linder: I went the route of just always doing the 64-bit add of the scratch wave offset into the SRsrc…
		Register ScratchRsrcSub0 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub0);
		Register ScratchRsrcSub1 = TRI->getSubReg(ScratchRsrcReg, AMDGPU::sub1);

		arsenmUnsubmitted Not Done Reply Inline Actions I think just 0xffff0000 would be clearer here arsenm: I think just 0xffff0000 would be clearer here
		// We cannot Kill ScratchWaveOffsetReg here because we allow it to be used in
		// the kernel body via inreg arguments.
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADD_U32), ScratchRsrcSub0)
		.addReg(ScratchRsrcSub0)
		.addReg(ScratchWaveOffsetReg)
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADDC_U32), ScratchRsrcSub1)
		.addReg(ScratchRsrcSub1)
		.addImm(0)
		.addReg(ScratchRsrcReg, RegState::ImplicitDefine);
		}

bool SIFrameLowering::isSupportedStackID(TargetStackID::Value ID) const {		bool SIFrameLowering::isSupportedStackID(TargetStackID::Value ID) const {
switch (ID) {		switch (ID) {
case TargetStackID::Default:		case TargetStackID::Default:
case TargetStackID::NoAlloc:		case TargetStackID::NoAlloc:
case TargetStackID::SGPRSpill:		case TargetStackID::SGPRSpill:
return true;		return true;
case TargetStackID::SVEVector:		case TargetStackID::SVEVector:
return false;		return false;
▲ Show 20 Lines • Show All 448 Lines • ▼ Show 20 Lines	if (!hasReservedCallFrame(MF)) {
llvm_unreachable("is this used?");		llvm_unreachable("is this used?");
}		}

return MBB.erase(I);		return MBB.erase(I);
}		}

bool SIFrameLowering::hasFP(const MachineFunction &MF) const {		bool SIFrameLowering::hasFP(const MachineFunction &MF) const {
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
if (MFI.hasCalls()) {
		// For entry functions we can use an immediate offset in most cases, so the
		// presence of calls doesn't imply we need a distinct frame pointer.
		if (MFI.hasCalls() &&
		!MF.getInfo<SIMachineFunctionInfo>()->isEntryFunction()) {
// All offsets are unsigned, so need to be addressed in the same direction		// All offsets are unsigned, so need to be addressed in the same direction
// as stack growth.		// as stack growth.

// FIXME: This function is pretty broken, since it can be called before the		// FIXME: This function is pretty broken, since it can be called before the
// frame layout is determined or CSR spills are inserted.		// frame layout is determined or CSR spills are inserted.
if (MFI.getStackSize() != 0)		return MFI.getStackSize() != 0;
return true;

// For the entry point, the input wave scratch offset must be copied to the
// API SP if there are calls.
if (MF.getInfo<SIMachineFunctionInfo>()->isEntryFunction())
return true;
}		}

return MFI.hasVarSizedObjects() \|\| MFI.isFrameAddressTaken() \|\|		return MFI.hasVarSizedObjects() \|\| MFI.isFrameAddressTaken() \|\|
MFI.hasStackMap() \|\| MFI.hasPatchPoint() \|\|		MFI.hasStackMap() \|\| MFI.hasPatchPoint() \|\|
MF.getSubtarget<GCNSubtarget>().getRegisterInfo()->needsStackRealignment(MF) \|\|		MF.getSubtarget<GCNSubtarget>().getRegisterInfo()->needsStackRealignment(MF) \|\|
MF.getTarget().Options.DisableFramePointerElim(MF);		MF.getTarget().Options.DisableFramePointerElim(MF);
}		}

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,906 Lines • ▼ Show 20 Lines	if (RequiresStackAccess && ST.isAmdHsaOrMesa(MF.getFunction())) {
// argument to these reserved registers.		// argument to these reserved registers.

// Without HSA, relocations are used for the scratch pointer and the		// Without HSA, relocations are used for the scratch pointer and the
// buffer resource setup is always inserted in the prologue. Scratch wave		// buffer resource setup is always inserted in the prologue. Scratch wave
// offset is still in an input SGPR.		// offset is still in an input SGPR.
Info.setScratchRSrcReg(ReservedBufferReg);		Info.setScratchRSrcReg(ReservedBufferReg);
}		}

// hasFP should be accurate for kernels even before the frame is finalized.
if (ST.getFrameLowering()->hasFP(MF)) {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();

		// For entry functions we have to set up the stack pointer if we use it,
		// whereas non-entry functions get this "for free". This means there is no
		// intrinsic advantage to using S32 over S34 in cases where we do not have
		// calls but do need a frame pointer (i.e. if we are requested to have one
		// because frame pointer elimination is disabled). To keep things simple we
		// only ever use S32 as the call ABI stack pointer, and so using it does not
		// imply we need a separate frame pointer.
		//
// Try to use s32 as the SP, but move it if it would interfere with input		// Try to use s32 as the SP, but move it if it would interfere with input
// arguments. This won't work with calls though.		// arguments. This won't work with calls though.
//		//
// FIXME: Move SP to avoid any possible inputs, or find a way to spill input		// FIXME: Move SP to avoid any possible inputs, or find a way to spill input
// registers.		// registers.
if (!MRI.isLiveIn(AMDGPU::SGPR32)) {		if (!MRI.isLiveIn(AMDGPU::SGPR32)) {
Info.setStackPtrOffsetReg(AMDGPU::SGPR32);		Info.setStackPtrOffsetReg(AMDGPU::SGPR32);
} else {		} else {
assert(AMDGPU::isShader(MF.getFunction().getCallingConv()));		assert(AMDGPU::isShader(MF.getFunction().getCallingConv()));

if (MFI.hasCalls())		if (MFI.hasCalls())
report_fatal_error("call in graphics shader with too many input SGPRs");		report_fatal_error("call in graphics shader with too many input SGPRs");

for (unsigned Reg : AMDGPU::SGPR_32RegClass) {		for (unsigned Reg : AMDGPU::SGPR_32RegClass) {
if (!MRI.isLiveIn(Reg)) {		if (!MRI.isLiveIn(Reg)) {
Info.setStackPtrOffsetReg(Reg);		Info.setStackPtrOffsetReg(Reg);
break;		break;
}		}
}		}

if (Info.getStackPtrOffsetReg() == AMDGPU::SP_REG)		if (Info.getStackPtrOffsetReg() == AMDGPU::SP_REG)
report_fatal_error("failed to find register for SP");		report_fatal_error("failed to find register for SP");
}		}

if (MFI.hasCalls()) {		// hasFP should be accurate for entry functions even before the frame is
Info.setScratchWaveOffsetReg(AMDGPU::SGPR33);		// finalized, because it does not rely on the known stack size, only
Info.setFrameOffsetReg(AMDGPU::SGPR33);		// properties like whether variable sized objects are present.
} else {		if (ST.getFrameLowering()->hasFP(MF)) {
unsigned ReservedOffsetReg =		Info.setFrameOffsetReg(AMDGPU::SGPR34);
TRI.reservedPrivateSegmentWaveByteOffsetReg(MF);
Info.setScratchWaveOffsetReg(ReservedOffsetReg);
Info.setFrameOffsetReg(ReservedOffsetReg);
}
} else if (RequiresStackAccess) {
assert(!MFI.hasCalls());
// We know there are accesses and they will be done relative to SP, so just
// pin it to the input.
//
// FIXME: Should not do this if inline asm is reading/writing these
// registers.
Register PreloadedSP = Info.getPreloadedReg(
AMDGPUFunctionArgInfo::PRIVATE_SEGMENT_WAVE_BYTE_OFFSET);

Info.setStackPtrOffsetReg(PreloadedSP);
Info.setScratchWaveOffsetReg(PreloadedSP);
Info.setFrameOffsetReg(PreloadedSP);
} else {
assert(!MFI.hasCalls());

// There may not be stack access at all. There may still be spills, or
// access of a constant pointer (in which cases an extra copy will be
// emitted in the prolog).
unsigned ReservedOffsetReg
= TRI.reservedPrivateSegmentWaveByteOffsetReg(MF);
Info.setStackPtrOffsetReg(ReservedOffsetReg);
Info.setScratchWaveOffsetReg(ReservedOffsetReg);
Info.setFrameOffsetReg(ReservedOffsetReg);
}		}
}		}

bool SITargetLowering::supportSplitCSR(MachineFunction *MF) const {		bool SITargetLowering::supportSplitCSR(MachineFunction *MF) const {
const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF->getInfo<SIMachineFunctionInfo>();
return !Info->isEntryFunction();		return !Info->isEntryFunction();
}		}

▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	if (!IsEntryFunc && !AMDGPUTargetMachine::EnableFixedFunctionABI) {
allocateSpecialInputVGPRs(CCInfo, MF, TRI, Info);		allocateSpecialInputVGPRs(CCInfo, MF, TRI, Info);
}		}

// Start adding system SGPRs.		// Start adding system SGPRs.
if (IsEntryFunc) {		if (IsEntryFunc) {
allocateSystemSGPRs(CCInfo, MF, *Info, CallConv, IsShader);		allocateSystemSGPRs(CCInfo, MF, *Info, CallConv, IsShader);
} else {		} else {
CCInfo.AllocateReg(Info->getScratchRSrcReg());		CCInfo.AllocateReg(Info->getScratchRSrcReg());
CCInfo.AllocateReg(Info->getScratchWaveOffsetReg());
CCInfo.AllocateReg(Info->getFrameOffsetReg());
allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);		allocateSpecialInputSGPRs(CCInfo, MF, TRI, Info);
}		}

auto &ArgUsageInfo =		auto &ArgUsageInfo =
DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();		DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();
ArgUsageInfo.setFuncArgInfo(Fn, Info->getArgInfo());		ArgUsageInfo.setFuncArgInfo(Fn, Info->getArgInfo());

unsigned StackArgSize = CCInfo.getNextStackOffset();		unsigned StackArgSize = CCInfo.getNextStackOffset();
▲ Show 20 Lines • Show All 8,404 Lines • ▼ Show 20 Lines	void SITargetLowering::finalizeLowering(MachineFunction &MF) const {
// We need to worry about replacing the default register with itself in case		// We need to worry about replacing the default register with itself in case
// of MIR testcases missing the MFI.		// of MIR testcases missing the MFI.
if (Info->getScratchRSrcReg() != AMDGPU::PRIVATE_RSRC_REG)		if (Info->getScratchRSrcReg() != AMDGPU::PRIVATE_RSRC_REG)
MRI.replaceRegWith(AMDGPU::PRIVATE_RSRC_REG, Info->getScratchRSrcReg());		MRI.replaceRegWith(AMDGPU::PRIVATE_RSRC_REG, Info->getScratchRSrcReg());

if (Info->getFrameOffsetReg() != AMDGPU::FP_REG)		if (Info->getFrameOffsetReg() != AMDGPU::FP_REG)
MRI.replaceRegWith(AMDGPU::FP_REG, Info->getFrameOffsetReg());		MRI.replaceRegWith(AMDGPU::FP_REG, Info->getFrameOffsetReg());

if (Info->getScratchWaveOffsetReg() != AMDGPU::SCRATCH_WAVE_OFFSET_REG) {
MRI.replaceRegWith(AMDGPU::SCRATCH_WAVE_OFFSET_REG,
Info->getScratchWaveOffsetReg());
}

Info->limitOccupancy(MF);		Info->limitOccupancy(MF);

if (ST.isWave32() && !MF.empty()) {		if (ST.isWave32() && !MF.empty()) {
// Add VCC_HI def because many instructions marked as imp-use VCC where		// Add VCC_HI def because many instructions marked as imp-use VCC where
// we may only define VCC_LO. If nothing defines VCC_HI we may end up		// we may only define VCC_LO. If nothing defines VCC_HI we may end up
// having a use of undef.		// having a use of undef.

const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	struct SIMachineFunctionInfo final : public yaml::MachineFunctionInfo {
unsigned LDSSize = 0;		unsigned LDSSize = 0;
bool IsEntryFunction = false;		bool IsEntryFunction = false;
bool NoSignedZerosFPMath = false;		bool NoSignedZerosFPMath = false;
bool MemoryBound = false;		bool MemoryBound = false;
bool WaveLimiter = false;		bool WaveLimiter = false;
uint32_t HighBitsOf32BitAddress = 0;		uint32_t HighBitsOf32BitAddress = 0;

StringValue ScratchRSrcReg = "$private_rsrc_reg";		StringValue ScratchRSrcReg = "$private_rsrc_reg";
StringValue ScratchWaveOffsetReg = "$scratch_wave_offset_reg";
StringValue FrameOffsetReg = "$fp_reg";		StringValue FrameOffsetReg = "$fp_reg";
StringValue StackPtrOffsetReg = "$sp_reg";		StringValue StackPtrOffsetReg = "$sp_reg";

Optional<SIArgumentInfo> ArgInfo;		Optional<SIArgumentInfo> ArgInfo;
SIMode Mode;		SIMode Mode;

SIMachineFunctionInfo() = default;		SIMachineFunctionInfo() = default;
SIMachineFunctionInfo(const llvm::SIMachineFunctionInfo &,		SIMachineFunctionInfo(const llvm::SIMachineFunctionInfo &,
Show All 10 Lines	static void mapping(IO &YamlIO, SIMachineFunctionInfo &MFI) {
YamlIO.mapOptional("maxKernArgAlign", MFI.MaxKernArgAlign, 0u);		YamlIO.mapOptional("maxKernArgAlign", MFI.MaxKernArgAlign, 0u);
YamlIO.mapOptional("ldsSize", MFI.LDSSize, 0u);		YamlIO.mapOptional("ldsSize", MFI.LDSSize, 0u);
YamlIO.mapOptional("isEntryFunction", MFI.IsEntryFunction, false);		YamlIO.mapOptional("isEntryFunction", MFI.IsEntryFunction, false);
YamlIO.mapOptional("noSignedZerosFPMath", MFI.NoSignedZerosFPMath, false);		YamlIO.mapOptional("noSignedZerosFPMath", MFI.NoSignedZerosFPMath, false);
YamlIO.mapOptional("memoryBound", MFI.MemoryBound, false);		YamlIO.mapOptional("memoryBound", MFI.MemoryBound, false);
YamlIO.mapOptional("waveLimiter", MFI.WaveLimiter, false);		YamlIO.mapOptional("waveLimiter", MFI.WaveLimiter, false);
YamlIO.mapOptional("scratchRSrcReg", MFI.ScratchRSrcReg,		YamlIO.mapOptional("scratchRSrcReg", MFI.ScratchRSrcReg,
StringValue("$private_rsrc_reg"));		StringValue("$private_rsrc_reg"));
YamlIO.mapOptional("scratchWaveOffsetReg", MFI.ScratchWaveOffsetReg,
StringValue("$scratch_wave_offset_reg"));
YamlIO.mapOptional("frameOffsetReg", MFI.FrameOffsetReg,		YamlIO.mapOptional("frameOffsetReg", MFI.FrameOffsetReg,
StringValue("$fp_reg"));		StringValue("$fp_reg"));
YamlIO.mapOptional("stackPtrOffsetReg", MFI.StackPtrOffsetReg,		YamlIO.mapOptional("stackPtrOffsetReg", MFI.StackPtrOffsetReg,
StringValue("$sp_reg"));		StringValue("$sp_reg"));
YamlIO.mapOptional("argumentInfo", MFI.ArgInfo);		YamlIO.mapOptional("argumentInfo", MFI.ArgInfo);
YamlIO.mapOptional("mode", MFI.Mode, SIMode());		YamlIO.mapOptional("mode", MFI.Mode, SIMode());
YamlIO.mapOptional("highBitsOf32BitAddress",		YamlIO.mapOptional("highBitsOf32BitAddress",
MFI.HighBitsOf32BitAddress, 0u);		MFI.HighBitsOf32BitAddress, 0u);
}		}
};		};

} // end namespace yaml		} // end namespace yaml

/// This class keeps track of the SPI_SP_INPUT_ADDR config register, which		/// This class keeps track of the SPI_SP_INPUT_ADDR config register, which
/// tells the hardware which interpolation parameters to load.		/// tells the hardware which interpolation parameters to load.
class SIMachineFunctionInfo final : public AMDGPUMachineFunction {		class SIMachineFunctionInfo final : public AMDGPUMachineFunction {
friend class GCNTargetMachine;		friend class GCNTargetMachine;

Register TIDReg = AMDGPU::NoRegister;		Register TIDReg = AMDGPU::NoRegister;

// Registers that may be reserved for spilling purposes. These may be the same		// Registers that may be reserved for spilling purposes. These may be the same
// as the input registers.		// as the input registers.
Register ScratchRSrcReg = AMDGPU::PRIVATE_RSRC_REG;		Register ScratchRSrcReg = AMDGPU::PRIVATE_RSRC_REG;
Register ScratchWaveOffsetReg = AMDGPU::SCRATCH_WAVE_OFFSET_REG;

// This is the current function's incremented size from the kernel's scratch		// This is the the unswizzled offset from the current dispatch's scratch wave
// wave offset register. For an entry function, this is exactly the same as		// base to the beginning of the current function's frame.
// the ScratchWaveOffsetReg.
Register FrameOffsetReg = AMDGPU::FP_REG;		Register FrameOffsetReg = AMDGPU::FP_REG;
		arsenmUnsubmitted Not Done Reply Inline Actions These should be switched to Register at some point arsenm: These should be switched to Register at some point
		scott.linderAuthorUnsubmitted Done Reply Inline Actions I haven't gotten around to this yet, but I'll do this in another NFC patch. scott.linder: I haven't gotten around to this yet, but I'll do this in another NFC patch.

// Top of the stack SGPR offset derived from the ScratchWaveOffsetReg.		// This is an ABI register used in the non-entry calling convention to
		// communicate the unswizzled offset from the current dispatch's scratch wave
		// base to the beginning of the new function's frame.
Register StackPtrOffsetReg = AMDGPU::SP_REG;		Register StackPtrOffsetReg = AMDGPU::SP_REG;

AMDGPUFunctionArgInfo ArgInfo;		AMDGPUFunctionArgInfo ArgInfo;

// Graphics info.		// Graphics info.
unsigned PSInputAddr = 0;		unsigned PSInputAddr = 0;
unsigned PSInputEnable = 0;		unsigned PSInputEnable = 0;

▲ Show 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	Register getScratchRSrcReg() const {
return ScratchRSrcReg;		return ScratchRSrcReg;
}		}

void setScratchRSrcReg(Register Reg) {		void setScratchRSrcReg(Register Reg) {
assert(Reg != 0 && "Should never be unset");		assert(Reg != 0 && "Should never be unset");
ScratchRSrcReg = Reg;		ScratchRSrcReg = Reg;
}		}

Register getScratchWaveOffsetReg() const {
return ScratchWaveOffsetReg;
}

Register getFrameOffsetReg() const {		Register getFrameOffsetReg() const {
return FrameOffsetReg;		return FrameOffsetReg;
}		}

void setFrameOffsetReg(Register Reg) {		void setFrameOffsetReg(Register Reg) {
assert(Reg != 0 && "Should never be unset");		assert(Reg != 0 && "Should never be unset");
FrameOffsetReg = Reg;		FrameOffsetReg = Reg;
}		}

void setStackPtrOffsetReg(Register Reg) {		void setStackPtrOffsetReg(Register Reg) {
assert(Reg != 0 && "Should never be unset");		assert(Reg != 0 && "Should never be unset");
StackPtrOffsetReg = Reg;		StackPtrOffsetReg = Reg;
}		}

// Note the unset value for this is AMDGPU::SP_REG rather than		// Note the unset value for this is AMDGPU::SP_REG rather than
// NoRegister. This is mostly a workaround for MIR tests where state that		// NoRegister. This is mostly a workaround for MIR tests where state that
// can't be directly computed from the function is not preserved in serialized		// can't be directly computed from the function is not preserved in serialized
// MIR.		// MIR.
Register getStackPtrOffsetReg() const {		Register getStackPtrOffsetReg() const {
return StackPtrOffsetReg;		return StackPtrOffsetReg;
}		}

void setScratchWaveOffsetReg(Register Reg) {
assert(Reg != 0 && "Should never be unset");
ScratchWaveOffsetReg = Reg;
}

Register getQueuePtrUserSGPR() const {		Register getQueuePtrUserSGPR() const {
return ArgInfo.QueuePtr.getRegister();		return ArgInfo.QueuePtr.getRegister();
}		}

Register getImplicitBufferPtrUserSGPR() const {		Register getImplicitBufferPtrUserSGPR() const {
return ArgInfo.ImplicitBufferPtr.getRegister();		return ArgInfo.ImplicitBufferPtr.getRegister();
}		}

▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	SIMachineFunctionInfo::SIMachineFunctionInfo(const MachineFunction &MF)
} else if (CC == CallingConv::AMDGPU_PS) {		} else if (CC == CallingConv::AMDGPU_PS) {
PSInputAddr = AMDGPU::getInitialPSInputAddr(F);		PSInputAddr = AMDGPU::getInitialPSInputAddr(F);
}		}

if (!isEntryFunction()) {		if (!isEntryFunction()) {
// Non-entry functions have no special inputs for now, other registers		// Non-entry functions have no special inputs for now, other registers
// required for scratch access.		// required for scratch access.
ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;		ScratchRSrcReg = AMDGPU::SGPR0_SGPR1_SGPR2_SGPR3;
ScratchWaveOffsetReg = AMDGPU::SGPR33;

// TODO: Pick a high register, and shift down, similar to a kernel.		// TODO: Pick a high register, and shift down, similar to a kernel.
FrameOffsetReg = AMDGPU::SGPR34;		FrameOffsetReg = AMDGPU::SGPR34;
StackPtrOffsetReg = AMDGPU::SGPR32;		StackPtrOffsetReg = AMDGPU::SGPR32;

ArgInfo.PrivateSegmentBuffer =		ArgInfo.PrivateSegmentBuffer =
ArgDescriptor::createRegister(ScratchRSrcReg);		ArgDescriptor::createRegister(ScratchRSrcReg);
ArgInfo.PrivateSegmentWaveByteOffset =
ArgDescriptor::createRegister(ScratchWaveOffsetReg);

if (F.hasFnAttribute("amdgpu-implicitarg-ptr"))		if (F.hasFnAttribute("amdgpu-implicitarg-ptr"))
ImplicitArgPtr = true;		ImplicitArgPtr = true;
} else {		} else {
if (F.hasFnAttribute("amdgpu-implicitarg-ptr")) {		if (F.hasFnAttribute("amdgpu-implicitarg-ptr")) {
KernargSegmentPtr = true;		KernargSegmentPtr = true;
MaxKernArgAlign = std::max(ST.getAlignmentForImplicitArgPtr(),		MaxKernArgAlign = std::max(ST.getAlignmentForImplicitArgPtr(),
MaxKernArgAlign);		MaxKernArgAlign);
▲ Show 20 Lines • Show All 410 Lines • ▼ Show 20 Lines	: ExplicitKernArgSize(MFI.getExplicitKernArgSize()),
MaxKernArgAlign(MFI.getMaxKernArgAlign()),		MaxKernArgAlign(MFI.getMaxKernArgAlign()),
LDSSize(MFI.getLDSSize()),		LDSSize(MFI.getLDSSize()),
IsEntryFunction(MFI.isEntryFunction()),		IsEntryFunction(MFI.isEntryFunction()),
NoSignedZerosFPMath(MFI.hasNoSignedZerosFPMath()),		NoSignedZerosFPMath(MFI.hasNoSignedZerosFPMath()),
MemoryBound(MFI.isMemoryBound()),		MemoryBound(MFI.isMemoryBound()),
WaveLimiter(MFI.needsWaveLimiter()),		WaveLimiter(MFI.needsWaveLimiter()),
HighBitsOf32BitAddress(MFI.get32BitAddressHighBits()),		HighBitsOf32BitAddress(MFI.get32BitAddressHighBits()),
ScratchRSrcReg(regToString(MFI.getScratchRSrcReg(), TRI)),		ScratchRSrcReg(regToString(MFI.getScratchRSrcReg(), TRI)),
ScratchWaveOffsetReg(regToString(MFI.getScratchWaveOffsetReg(), TRI)),
FrameOffsetReg(regToString(MFI.getFrameOffsetReg(), TRI)),		FrameOffsetReg(regToString(MFI.getFrameOffsetReg(), TRI)),
StackPtrOffsetReg(regToString(MFI.getStackPtrOffsetReg(), TRI)),		StackPtrOffsetReg(regToString(MFI.getStackPtrOffsetReg(), TRI)),
ArgInfo(convertArgumentInfo(MFI.getArgInfo(), TRI)),		ArgInfo(convertArgumentInfo(MFI.getArgInfo(), TRI)),
Mode(MFI.getMode()) {}		Mode(MFI.getMode()) {}

void yaml::SIMachineFunctionInfo::mappingImpl(yaml::IO &YamlIO) {		void yaml::SIMachineFunctionInfo::mappingImpl(yaml::IO &YamlIO) {
MappingTraits<SIMachineFunctionInfo>::mapping(YamlIO, *this);		MappingTraits<SIMachineFunctionInfo>::mapping(YamlIO, *this);
}		}
Show All 13 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	public:
bool spillSGPRToVGPR() const {		bool spillSGPRToVGPR() const {
return SpillSGPRToVGPR;		return SpillSGPRToVGPR;
}		}

/// Return the end register initially reserved for the scratch buffer in case		/// Return the end register initially reserved for the scratch buffer in case
/// spilling is needed.		/// spilling is needed.
unsigned reservedPrivateSegmentBufferReg(const MachineFunction &MF) const;		unsigned reservedPrivateSegmentBufferReg(const MachineFunction &MF) const;

/// Return the end register initially reserved for the scratch wave offset in
/// case spilling is needed.
unsigned reservedPrivateSegmentWaveByteOffsetReg(
const MachineFunction &MF) const;

BitVector getReservedRegs(const MachineFunction &MF) const override;		BitVector getReservedRegs(const MachineFunction &MF) const override;

const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;		const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;
const MCPhysReg getCalleeSavedRegsViaCopy(const MachineFunction MF) const;		const MCPhysReg getCalleeSavedRegsViaCopy(const MachineFunction MF) const;
const uint32_t *getCallPreservedMask(const MachineFunction &MF,		const uint32_t *getCallPreservedMask(const MachineFunction &MF,
CallingConv::ID) const override;		CallingConv::ID) const override;

// Stack access is very expensive. CSRs are also the high registers, and we		// Stack access is very expensive. CSRs are also the high registers, and we
▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	default:
return nullptr;		return nullptr;
}		}
}		}

Register SIRegisterInfo::getFrameRegister(const MachineFunction &MF) const {		Register SIRegisterInfo::getFrameRegister(const MachineFunction &MF) const {
const SIFrameLowering *TFI =		const SIFrameLowering *TFI =
MF.getSubtarget<GCNSubtarget>().getFrameLowering();		MF.getSubtarget<GCNSubtarget>().getFrameLowering();
const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
		// During ISel lowering we always reserve the stack pointer in entry
		// functions, but never actually want to reference it when accessing our own
		// frame. If we need a frame pointer we use it, but otherwise we can just use
		// an immediate "0" which we represent by returning NoRegister.
		if (FuncInfo->isEntryFunction()) {
		return TFI->hasFP(MF) ? FuncInfo->getFrameOffsetReg() : Register();
		arsenmUnsubmitted Done Reply Inline Actions s/NoRegister/Register() arsenm: s/NoRegister/Register()
		}
return TFI->hasFP(MF) ? FuncInfo->getFrameOffsetReg()		return TFI->hasFP(MF) ? FuncInfo->getFrameOffsetReg()
: FuncInfo->getStackPtrOffsetReg();		: FuncInfo->getStackPtrOffsetReg();
}		}

const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const {		const uint32_t *SIRegisterInfo::getAllVGPRRegMask() const {
return CSR_AMDGPU_AllVGPRs_RegMask;		return CSR_AMDGPU_AllVGPRs_RegMask;
}		}

▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines

unsigned SIRegisterInfo::reservedPrivateSegmentBufferReg(		unsigned SIRegisterInfo::reservedPrivateSegmentBufferReg(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
unsigned BaseIdx = alignDown(ST.getMaxNumSGPRs(MF), 4) - 4;		unsigned BaseIdx = alignDown(ST.getMaxNumSGPRs(MF), 4) - 4;
unsigned BaseReg(AMDGPU::SGPR_32RegClass.getRegister(BaseIdx));		unsigned BaseReg(AMDGPU::SGPR_32RegClass.getRegister(BaseIdx));
return getMatchingSuperReg(BaseReg, AMDGPU::sub0, &AMDGPU::SGPR_128RegClass);		return getMatchingSuperReg(BaseReg, AMDGPU::sub0, &AMDGPU::SGPR_128RegClass);
}		}

static unsigned findPrivateSegmentWaveByteOffsetRegIndex(unsigned RegCount) {
unsigned Reg;

// Try to place it in a hole after PrivateSegmentBufferReg.
if (RegCount & 3) {
// We cannot put the segment buffer in (Idx - 4) ... (Idx - 1) due to
// alignment constraints, so we have a hole where can put the wave offset.
Reg = RegCount - 1;
} else {
// We can put the segment buffer in (Idx - 4) ... (Idx - 1) and put the
// wave offset before it.
Reg = RegCount - 5;
}

return Reg;
}

unsigned SIRegisterInfo::reservedPrivateSegmentWaveByteOffsetReg(
const MachineFunction &MF) const {
unsigned Reg = findPrivateSegmentWaveByteOffsetRegIndex(ST.getMaxNumSGPRs(MF));
return AMDGPU::SGPR_32RegClass.getRegister(Reg);
}

BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {		BitVector SIRegisterInfo::getReservedRegs(const MachineFunction &MF) const {
BitVector Reserved(getNumRegs());		BitVector Reserved(getNumRegs());

// EXEC_LO and EXEC_HI could be allocated and used as regular register, but		// EXEC_LO and EXEC_HI could be allocated and used as regular register, but
// this seems likely to result in bugs, so I'm marking them as reserved.		// this seems likely to result in bugs, so I'm marking them as reserved.
reserveRegisterTuples(Reserved, AMDGPU::EXEC);		reserveRegisterTuples(Reserved, AMDGPU::EXEC);
reserveRegisterTuples(Reserved, AMDGPU::FLAT_SCR);		reserveRegisterTuples(Reserved, AMDGPU::FLAT_SCR);

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	if (!ST.hasMAIInsts()) {
for (unsigned i = 0; i < MaxNumVGPRs; ++i) {		for (unsigned i = 0; i < MaxNumVGPRs; ++i) {
unsigned Reg = AMDGPU::AGPR_32RegClass.getRegister(i);		unsigned Reg = AMDGPU::AGPR_32RegClass.getRegister(i);
reserveRegisterTuples(Reserved, Reg);		reserveRegisterTuples(Reserved, Reg);
}		}
}		}

const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

unsigned ScratchWaveOffsetReg = MFI->getScratchWaveOffsetReg();
if (ScratchWaveOffsetReg != AMDGPU::NoRegister) {
// Reserve 1 SGPR for scratch wave offset in case we need to spill.
reserveRegisterTuples(Reserved, ScratchWaveOffsetReg);
}

unsigned ScratchRSrcReg = MFI->getScratchRSrcReg();		unsigned ScratchRSrcReg = MFI->getScratchRSrcReg();
if (ScratchRSrcReg != AMDGPU::NoRegister) {		if (ScratchRSrcReg != AMDGPU::NoRegister) {
// Reserve 4 SGPRs for the scratch buffer resource descriptor in case we need		// Reserve 4 SGPRs for the scratch buffer resource descriptor in case we need
// to spill.		// to spill.
// TODO: May need to reserve a VGPR if doing LDS spilling.		// TODO: May need to reserve a VGPR if doing LDS spilling.
reserveRegisterTuples(Reserved, ScratchRSrcReg);		reserveRegisterTuples(Reserved, ScratchRSrcReg);
assert(!isSubRegister(ScratchRSrcReg, ScratchWaveOffsetReg));
}		}

// We have to assume the SP is needed in case there are calls in the function,		// We have to assume the SP is needed in case there are calls in the function,
// which is detected after the function is lowered. If we aren't really going		// which is detected after the function is lowered. If we aren't really going
// to need SP, don't bother reserving it.		// to need SP, don't bother reserving it.
unsigned StackPtrReg = MFI->getStackPtrOffsetReg();		unsigned StackPtrReg = MFI->getStackPtrOffsetReg();

if (StackPtrReg != AMDGPU::NoRegister) {		if (StackPtrReg != AMDGPU::NoRegister) {
▲ Show 20 Lines • Show All 414 Lines • ▼ Show 20 Lines	if (!isUInt<12>(Offset + Size - EltSize)) {
Offset *= ST.getWavefrontSize();		Offset *= ST.getWavefrontSize();

// We don't have access to the register scavenger if this function is called		// We don't have access to the register scavenger if this function is called
// during PEI::scavengeFrameVirtualRegs().		// during PEI::scavengeFrameVirtualRegs().
if (RS)		if (RS)
SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);		SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);

if (SOffset == AMDGPU::NoRegister) {		if (SOffset == AMDGPU::NoRegister) {
		if (ScratchOffsetReg == AMDGPU::NoRegister) {
		report_fatal_error("could not scavenge SGPR to spill in entry function");
		}
// There are no free SGPRs, and since we are in the process of spilling		// There are no free SGPRs, and since we are in the process of spilling
// VGPRs too. Since we need a VGPR in order to spill SGPRs (this is true		// VGPRs too. Since we need a VGPR in order to spill SGPRs (this is true
// on SI/CI and on VI it is true until we implement spilling using scalar		// on SI/CI and on VI it is true until we implement spilling using scalar
// stores), we have no way to free up an SGPR. Our solution here is to		// stores), we have no way to free up an SGPR. Our solution here is to
// add the offset directly to the ScratchOffset register, and then		// add the offset directly to the ScratchOffset register, and then
// subtract the offset after the spill to return ScratchOffset to it's		// subtract the offset after the spill to return ScratchOffset to it's
// original value.		// original value.
SOffset = ScratchOffsetReg;		SOffset = ScratchOffsetReg;
ScratchOffsetRegDelta = Offset;		ScratchOffsetRegDelta = Offset;
} else {		} else {
Scavenged = true;		Scavenged = true;
}		}

		if (ScratchOffsetReg == AMDGPU::NoRegister) {
		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), SOffset)
		.addImm(Offset);
		} else {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), SOffset)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), SOffset)
.addReg(ScratchOffsetReg)		.addReg(ScratchOffsetReg)
.addImm(Offset);		.addImm(Offset);
		}

Offset = 0;		Offset = 0;
}		}

for (unsigned i = 0, e = NumSubRegs; i != e; ++i, Offset += EltSize) {		for (unsigned i = 0, e = NumSubRegs; i != e; ++i, Offset += EltSize) {
Register SubReg = NumSubRegs == 1		Register SubReg = NumSubRegs == 1
? Register(ValueReg)		? Register(ValueReg)
: getSubReg(ValueReg, getSubRegFromChannel(i));		: getSubReg(ValueReg, getSubRegFromChannel(i));
Show All 18 Lines	if (!MIB.getInstr()) {
}		}

MachinePointerInfo PInfo = BasePtrInfo.getWithOffset(EltSize * i);		MachinePointerInfo PInfo = BasePtrInfo.getWithOffset(EltSize * i);
MachineMemOperand *NewMMO		MachineMemOperand *NewMMO
= MF->getMachineMemOperand(PInfo, MMO->getFlags(),		= MF->getMachineMemOperand(PInfo, MMO->getFlags(),
EltSize, MinAlign(Align, EltSize * i));		EltSize, MinAlign(Align, EltSize * i));

MIB = BuildMI(*MBB, MI, DL, Desc)		MIB = BuildMI(*MBB, MI, DL, Desc)
.addReg(SubReg, getDefRegState(!IsStore) \| getKillRegState(IsKill))		.addReg(SubReg,
.addReg(ScratchRsrcReg)		getDefRegState(!IsStore) \| getKillRegState(IsKill))
.addReg(SOffset, SOffsetRegState)		.addReg(ScratchRsrcReg);
.addImm(Offset)		if (SOffset == AMDGPU::NoRegister) {
		MIB.addImm(0);
		} else {
		MIB.addReg(SOffset, SOffsetRegState);
		}
		MIB.addImm(Offset)
.addImm(0) // glc		.addImm(0) // glc
.addImm(0) // slc		.addImm(0) // slc
.addImm(0) // tfe		.addImm(0) // tfe
.addImm(0) // dlc		.addImm(0) // dlc
.addImm(0) // swz		.addImm(0) // swz
.addMemOperand(NewMMO);		.addMemOperand(NewMMO);

if (!IsStore && TmpReg != AMDGPU::NoRegister)		if (!IsStore && TmpReg != AMDGPU::NoRegister)
MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32),		MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32),
FinalReg)		FinalReg)
.addReg(TmpReg, RegState::Kill);		.addReg(TmpReg, RegState::Kill);
}		}

if (NumSubRegs > 1)		if (NumSubRegs > 1)
Show All 27 Lines	bool SIRegisterInfo::spillSGPR(MachineBasicBlock::iterator MI,

Register SuperReg = MI->getOperand(0).getReg();		Register SuperReg = MI->getOperand(0).getReg();
bool IsKill = MI->getOperand(0).isKill();		bool IsKill = MI->getOperand(0).isKill();
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();

MachineFrameInfo &FrameInfo = MF->getFrameInfo();		MachineFrameInfo &FrameInfo = MF->getFrameInfo();

assert(SpillToVGPR \|\| (SuperReg != MFI->getStackPtrOffsetReg() &&		assert(SpillToVGPR \|\| (SuperReg != MFI->getStackPtrOffsetReg() &&
SuperReg != MFI->getFrameOffsetReg() &&		SuperReg != MFI->getFrameOffsetReg()));
SuperReg != MFI->getScratchWaveOffsetReg()));

assert(SuperReg != AMDGPU::M0 && "m0 should never spill");		assert(SuperReg != AMDGPU::M0 && "m0 should never spill");

unsigned EltSize = 4;		unsigned EltSize = 4;
const TargetRegisterClass *RC = getPhysRegClass(SuperReg);		const TargetRegisterClass *RC = getPhysRegClass(SuperReg);

ArrayRef<int16_t> SplitParts = getRegSplitParts(RC, EltSize);		ArrayRef<int16_t> SplitParts = getRegSplitParts(RC, EltSize);
unsigned NumSubRegs = SplitParts.empty() ? 1 : SplitParts.size();		unsigned NumSubRegs = SplitParts.empty() ? 1 : SplitParts.size();
▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	case AMDGPU::SI_SPILL_A1024_RESTORE: {
break;		break;
}		}

default: {		default: {
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();
bool IsMUBUF = TII->isMUBUF(*MI);		bool IsMUBUF = TII->isMUBUF(*MI);

if (!IsMUBUF && !MFI->isEntryFunction()) {		if (!IsMUBUF && !MFI->isEntryFunction()) {
// Convert to an absolute stack address by finding the offset from the		// Convert to a swizzled stack address by scaling by the wave size.
// scratch wave base and scaling by the wave size.
//		//
// In an entry function/kernel the offset is already the absolute		// In an entry function/kernel the offset is already swizzled.
// address relative to the frame register.

Register TmpDiffReg =
RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0, false);

// If there's no free SGPR, in-place modify the FP
Register DiffReg = TmpDiffReg.isValid() ? TmpDiffReg : FrameReg;

bool IsCopy = MI->getOpcode() == AMDGPU::V_MOV_B32_e32;		bool IsCopy = MI->getOpcode() == AMDGPU::V_MOV_B32_e32;
Register ResultReg = IsCopy ?		Register ResultReg =
MI->getOperand(0).getReg() :		IsCopy ? MI->getOperand(0).getReg()
RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MI, 0);		: RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MI, 0);

BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_SUB_U32), DiffReg)
.addReg(FrameReg)
.addReg(MFI->getScratchWaveOffsetReg());

int64_t Offset = FrameInfo.getObjectOffset(Index);		int64_t Offset = FrameInfo.getObjectOffset(Index);
if (Offset == 0) {		if (Offset == 0) {
// XXX - This never happens because of emergency scavenging slot at 0?		// XXX - This never happens because of emergency scavenging slot at 0?
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64), ResultReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64), ResultReg)
.addImm(ST.getWavefrontSizeLog2())		.addImm(ST.getWavefrontSizeLog2())
.addReg(DiffReg);		.addReg(FrameReg);
} else {		} else {
if (auto MIB = TII->getAddNoCarry(MBB, MI, DL, ResultReg, RS)) {		if (auto MIB = TII->getAddNoCarry(MBB, MI, DL, ResultReg, RS)) {
Register ScaledReg =		Register ScaledReg =
RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MIB, 0);		RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MIB, 0);

BuildMI(MBB, MIB, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64),		BuildMI(MBB, MIB, DL, TII->get(AMDGPU::V_LSHRREV_B32_e64),
ScaledReg)		ScaledReg)
.addImm(ST.getWavefrontSizeLog2())		.addImm(ST.getWavefrontSizeLog2())
.addReg(DiffReg, RegState::Kill);		.addReg(FrameReg);

const bool IsVOP2 = MIB->getOpcode() == AMDGPU::V_ADD_U32_e32;		const bool IsVOP2 = MIB->getOpcode() == AMDGPU::V_ADD_U32_e32;

// TODO: Fold if use instruction is another add of a constant.		// TODO: Fold if use instruction is another add of a constant.
if (IsVOP2 \|\| AMDGPU::isInlinableLiteral32(Offset, ST.hasInv2PiInlineImm())) {		if (IsVOP2 \|\| AMDGPU::isInlinableLiteral32(Offset, ST.hasInv2PiInlineImm())) {
// FIXME: This can fail		// FIXME: This can fail
MIB.addImm(Offset);		MIB.addImm(Offset);
MIB.addReg(ScaledReg, RegState::Kill);		MIB.addReg(ScaledReg, RegState::Kill);
Show All 20 Lines	default: {
// We have to produce a carry out, and there isn't a free SGPR pair		// We have to produce a carry out, and there isn't a free SGPR pair
// for it. We can keep the whole computation on the SALU to avoid		// for it. We can keep the whole computation on the SALU to avoid
// clobbering an additional register at the cost of an extra mov.		// clobbering an additional register at the cost of an extra mov.

// We may have 1 free scratch SGPR even though a carry out is		// We may have 1 free scratch SGPR even though a carry out is
// unavailable. Only one additional mov is needed.		// unavailable. Only one additional mov is needed.
Register TmpScaledReg =		Register TmpScaledReg =
RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0, false);		RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0, false);
Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : DiffReg;		Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : FrameReg;

BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHR_B32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHR_B32), ScaledReg)
.addReg(DiffReg, RegState::Kill)		.addReg(FrameReg)
.addImm(ST.getWavefrontSizeLog2());		.addImm(ST.getWavefrontSizeLog2());
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), ScaledReg)
.addReg(ScaledReg, RegState::Kill)		.addReg(ScaledReg, RegState::Kill)
.addImm(Offset);		.addImm(Offset);
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), ResultReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), ResultReg)
.addReg(ScaledReg, RegState::Kill);		.addReg(ScaledReg, RegState::Kill);

// If there were truly no free SGPRs, we need to undo everything.		// If there were truly no free SGPRs, we need to undo everything.
if (!TmpScaledReg.isValid()) {		if (!TmpScaledReg.isValid()) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_SUB_U32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_SUB_U32), ScaledReg)
.addReg(ScaledReg, RegState::Kill)		.addReg(ScaledReg, RegState::Kill)
.addImm(Offset);		.addImm(Offset);
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHL_B32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHL_B32), ScaledReg)
.addReg(DiffReg, RegState::Kill)		.addReg(FrameReg)
.addImm(ST.getWavefrontSizeLog2());		.addImm(ST.getWavefrontSizeLog2());
}		}
}		}
}		}

if (!TmpDiffReg.isValid()) {
// Restore the FP.
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), FrameReg)
.addReg(FrameReg)
.addReg(MFI->getScratchWaveOffsetReg());
}

// Don't introduce an extra copy if we're just materializing in a mov.		// Don't introduce an extra copy if we're just materializing in a mov.
if (IsCopy)		if (IsCopy)
MI->eraseFromParent();		MI->eraseFromParent();
else		else
FIOp.ChangeToRegister(ResultReg, false, false, true);		FIOp.ChangeToRegister(ResultReg, false, false, true);
return;		return;
}		}

if (IsMUBUF) {		if (IsMUBUF) {
// Disable offen so we don't need a 0 vgpr base.		// Disable offen so we don't need a 0 vgpr base.
assert(static_cast<int>(FIOperandNum) ==		assert(static_cast<int>(FIOperandNum) ==
AMDGPU::getNamedOperandIdx(MI->getOpcode(),		AMDGPU::getNamedOperandIdx(MI->getOpcode(),
AMDGPU::OpName::vaddr));		AMDGPU::OpName::vaddr));

assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==		auto &SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);
MFI->getStackPtrOffsetReg());		assert((SOffset.isReg() &&
		SOffset.getReg() == MFI->getStackPtrOffsetReg()) \|\|
TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->setReg(FrameReg);		(SOffset.isImm() && SOffset.getImm() == 0));
		if (SOffset.isReg()) {
		if (FrameReg == AMDGPU::NoRegister) {
		SOffset.ChangeToImmediate(0);
		} else {
		SOffset.setReg(FrameReg);
		}
		}

int64_t Offset = FrameInfo.getObjectOffset(Index);		int64_t Offset = FrameInfo.getObjectOffset(Index);
int64_t OldImm		int64_t OldImm
= TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm();		= TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm();
int64_t NewOffset = OldImm + Offset;		int64_t NewOffset = OldImm + Offset;

if (isUInt<12>(NewOffset) &&		if (isUInt<12>(NewOffset) &&
buildMUBUFOffsetLoadStore(ST, FrameInfo, MI, Index, NewOffset)) {		buildMUBUFOffsetLoadStore(ST, FrameInfo, MI, Index, NewOffset)) {
▲ Show 20 Lines • Show All 706 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.td

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	def VCC_LO : SIReg<"vcc_lo", 106>;			def VCC_LO : SIReg<"vcc_lo", 106>;
	def VCC_HI : SIReg<"vcc_hi", 107>;			def VCC_HI : SIReg<"vcc_hi", 107>;

	// Pseudo-registers: Used as placeholders during isel and immediately			// Pseudo-registers: Used as placeholders during isel and immediately
	// replaced, never seeing the verifier.			// replaced, never seeing the verifier.
	def PRIVATE_RSRC_REG : SIReg<"private_rsrc", 0>;			def PRIVATE_RSRC_REG : SIReg<"private_rsrc", 0>;
	def FP_REG : SIReg<"fp", 0>;			def FP_REG : SIReg<"fp", 0>;
	def SP_REG : SIReg<"sp", 0>;			def SP_REG : SIReg<"sp", 0>;
	def SCRATCH_WAVE_OFFSET_REG : SIReg<"scratch_wave_offset", 0>;

	// Pseudo-register to represent the program-counter DWARF register.			// Pseudo-register to represent the program-counter DWARF register.
	def PC_REG : SIReg<"pc", 0>, DwarfRegNum<[16]> {			def PC_REG : SIReg<"pc", 0>, DwarfRegNum<[16]> {
	// There is no physical register corresponding to a "program counter", but			// There is no physical register corresponding to a "program counter", but
	// we need to encode the concept in debug information in order to represent			// we need to encode the concept in debug information in order to represent
	// things like the return value in unwind information.			// things like the return value in unwind information.
	let isArtificial = 1;			let isArtificial = 1;
	}			}
	▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines
	// AGPR 1024-bit registers			// AGPR 1024-bit registers
	def AGPR_1024 : SIRegisterTuples<getSubRegs<32>.ret, AGPR_32, 255, 1, 32, "a">;			def AGPR_1024 : SIRegisterTuples<getSubRegs<32>.ret, AGPR_32, 255, 1, 32, "a">;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Register classes used as source and destination			// Register classes used as source and destination
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def Pseudo_SReg_32 : RegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16], 32,			def Pseudo_SReg_32 : RegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16], 32,
	(add FP_REG, SP_REG, SCRATCH_WAVE_OFFSET_REG)> {			(add FP_REG, SP_REG)> {
	let isAllocatable = 0;			let isAllocatable = 0;
	let CopyCost = -1;			let CopyCost = -1;
	}			}

	def Pseudo_SReg_128 : RegisterClass<"AMDGPU", [v4i32, v2i64, v2f64], 32,			def Pseudo_SReg_128 : RegisterClass<"AMDGPU", [v4i32, v2i64, v2f64], 32,
	(add PRIVATE_RSRC_REG)> {			(add PRIVATE_RSRC_REG)> {
	let isAllocatable = 0;			let isAllocatable = 0;
	let CopyCost = -1;			let CopyCost = -1;
	▲ Show 20 Lines • Show All 466 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_mov_b32 s4, 0			; CHECK-NEXT: s_mov_b32 s4, 0
	; CHECK-NEXT: ; %bb.3: ; %bb8			; CHECK-NEXT: ; %bb.3: ; %bb8
	; CHECK-NEXT: s_or_b64 exec, exec, s[6:7]			; CHECK-NEXT: s_or_b64 exec, exec, s[6:7]
	; CHECK-NEXT: v_cmp_eq_u32_e64 s[6:7], s4, 0			; CHECK-NEXT: v_cmp_eq_u32_e64 s[6:7], s4, 0
	; CHECK-NEXT: s_and_saveexec_b64 s[4:5], s[6:7]			; CHECK-NEXT: s_and_saveexec_b64 s[4:5], s[6:7]
	; CHECK-NEXT: s_cbranch_execz BB4_5			; CHECK-NEXT: s_cbranch_execz BB4_5
	; CHECK-NEXT: ; %bb.4: ; %bb11			; CHECK-NEXT: ; %bb.4: ; %bb11
	; CHECK-NEXT: v_mov_b32_e32 v0, 4.0			; CHECK-NEXT: v_mov_b32_e32 v0, 4.0
	; CHECK-NEXT: buffer_store_dword v0, v0, s[0:3], s33 offen			; CHECK-NEXT: buffer_store_dword v0, v0, s[0:3], 0 offen
	; CHECK-NEXT: BB4_5: ; %Flow			; CHECK-NEXT: BB4_5: ; %Flow
	; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]			; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
	; CHECK-NEXT: BB4_6: ; %bb12			; CHECK-NEXT: BB4_6: ; %bb12
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%tmp = load i32, i32 addrspace(4)* @external_constant			%tmp = load i32, i32 addrspace(4)* @external_constant
	%ptr = load float, float addrspace(4)* @const.ptr			%ptr = load float, float addrspace(4)* @const.ptr
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll

	Show First 20 Lines • Show All 1,678 Lines • ▼ Show 20 Lines
	; GPRIDX-NEXT: s_mov_b32 s22, s24			; GPRIDX-NEXT: s_mov_b32 s22, s24
	; GPRIDX-NEXT: s_mov_b32 s23, s25			; GPRIDX-NEXT: s_mov_b32 s23, s25
	; GPRIDX-NEXT: s_mov_b32 s24, s26			; GPRIDX-NEXT: s_mov_b32 s24, s26
	; GPRIDX-NEXT: s_mov_b32 s25, s27			; GPRIDX-NEXT: s_mov_b32 s25, s27
	; GPRIDX-NEXT: s_mov_b32 s26, s28			; GPRIDX-NEXT: s_mov_b32 s26, s28
	; GPRIDX-NEXT: s_mov_b32 s27, s29			; GPRIDX-NEXT: s_mov_b32 s27, s29
	; GPRIDX-NEXT: s_mov_b32 s28, s30			; GPRIDX-NEXT: s_mov_b32 s28, s30
	; GPRIDX-NEXT: s_mov_b32 s29, s31			; GPRIDX-NEXT: s_mov_b32 s29, s31
	; GPRIDX-NEXT: s_mov_b32 s30, s32
	; GPRIDX-NEXT: s_mov_b32 s31, s33			; GPRIDX-NEXT: s_mov_b32 s31, s33
				; GPRIDX-NEXT: s_mov_b32 s30, s32
	; GPRIDX-NEXT: s_mov_b32 m0, s35			; GPRIDX-NEXT: s_mov_b32 m0, s35
	; GPRIDX-NEXT: s_nop 0			; GPRIDX-NEXT: s_nop 0
	; GPRIDX-NEXT: s_movreld_b32 s0, s34			; GPRIDX-NEXT: s_movreld_b32 s0, s34
	; GPRIDX-NEXT: ; return to shader part epilog			; GPRIDX-NEXT: ; return to shader part epilog
	;			;
	; MOVREL-LABEL: dyn_insertelement_v32i32_s_s_s:			; MOVREL-LABEL: dyn_insertelement_v32i32_s_s_s:
	; MOVREL: ; %bb.0: ; %entry			; MOVREL: ; %bb.0: ; %entry
	; MOVREL-NEXT: s_mov_b32 s0, s2			; MOVREL-NEXT: s_mov_b32 s0, s2
	Show All 22 Lines
	; MOVREL-NEXT: s_mov_b32 s22, s24			; MOVREL-NEXT: s_mov_b32 s22, s24
	; MOVREL-NEXT: s_mov_b32 s23, s25			; MOVREL-NEXT: s_mov_b32 s23, s25
	; MOVREL-NEXT: s_mov_b32 s24, s26			; MOVREL-NEXT: s_mov_b32 s24, s26
	; MOVREL-NEXT: s_mov_b32 s25, s27			; MOVREL-NEXT: s_mov_b32 s25, s27
	; MOVREL-NEXT: s_mov_b32 s26, s28			; MOVREL-NEXT: s_mov_b32 s26, s28
	; MOVREL-NEXT: s_mov_b32 s27, s29			; MOVREL-NEXT: s_mov_b32 s27, s29
	; MOVREL-NEXT: s_mov_b32 s28, s30			; MOVREL-NEXT: s_mov_b32 s28, s30
	; MOVREL-NEXT: s_mov_b32 s29, s31			; MOVREL-NEXT: s_mov_b32 s29, s31
	; MOVREL-NEXT: s_mov_b32 s30, s32
	; MOVREL-NEXT: s_mov_b32 s31, s33			; MOVREL-NEXT: s_mov_b32 s31, s33
				; MOVREL-NEXT: s_mov_b32 s30, s32
	; MOVREL-NEXT: s_movreld_b32 s0, s34			; MOVREL-NEXT: s_movreld_b32 s0, s34
	; MOVREL-NEXT: ; implicit-def: $vcc_hi			; MOVREL-NEXT: ; implicit-def: $vcc_hi
	; MOVREL-NEXT: ; return to shader part epilog			; MOVREL-NEXT: ; return to shader part epilog
	entry:			entry:
	%insert = insertelement <32 x i32> %vec, i32 %val, i32 %idx			%insert = insertelement <32 x i32> %vec, i32 %val, i32 %idx
	ret <32 x i32> %insert			ret <32 x i32> %insert
	}			}

	▲ Show 20 Lines • Show All 458 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir

	Show All 10 Lines
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX7-LABEL: name: load_local_s32_from_4			; GFX7-LABEL: name: load_local_s32_from_4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)			; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]			; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX9-LABEL: name: load_local_s32_from_4			; GFX9-LABEL: name: load_local_s32_from_4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)			; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 3)			%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_s32_from_2			name: load_local_s32_from_2
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_2
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U16_:%[0-9]+]]:vgpr_32 = DS_READ_U16 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 2, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U16_]]
	; GFX7-LABEL: name: load_local_s32_from_2			; GFX7-LABEL: name: load_local_s32_from_2
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U16_:%[0-9]+]]:vgpr_32 = DS_READ_U16 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 2, addrspace 3)			; GFX7: [[DS_READ_U16_:%[0-9]+]]:vgpr_32 = DS_READ_U16 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 2, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U16_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U16_]]
	; GFX9-LABEL: name: load_local_s32_from_2			; GFX9-LABEL: name: load_local_s32_from_2
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_U16_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U16_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 2, addrspace 3)			; GFX9: [[DS_READ_U16_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U16_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 2, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U16_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U16_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_2
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U16_:%[0-9]+]]:vgpr_32 = DS_READ_U16 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 2, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U16_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 2, align 2, addrspace 3)			%1:vgpr(s32) = G_LOAD %0 :: (load 2, align 2, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_s32_from_1			name: load_local_s32_from_1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1			; GFX7-LABEL: name: load_local_s32_from_1
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1			; GFX9-LABEL: name: load_local_s32_from_1
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 3)			%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_v2s32			name: load_local_v2s32
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v2s32
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX7-LABEL: name: load_local_v2s32			; GFX7-LABEL: name: load_local_v2s32
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)			; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX9-LABEL: name: load_local_v2s32			; GFX9-LABEL: name: load_local_v2s32
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)			; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]
				; GFX6-LABEL: name: load_local_v2s32
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_v2s32_align4			name: load_local_v2s32_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v2s32_align4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x s32>) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x s32>)
	; GFX7-LABEL: name: load_local_v2s32_align4			; GFX7-LABEL: name: load_local_v2s32_align4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_v2s32_align4			; GFX9-LABEL: name: load_local_v2s32_align4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_v2s32_align4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x s32>) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x s32>)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 4, addrspace 3)			%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_s64			name: load_local_s64
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s64
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX7-LABEL: name: load_local_s64			; GFX7-LABEL: name: load_local_s64
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)			; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX9-LABEL: name: load_local_s64			; GFX9-LABEL: name: load_local_s64
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)			; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]
				; GFX6-LABEL: name: load_local_s64
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_s64_align4			name: load_local_s64_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s64_align4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	; GFX7-LABEL: name: load_local_s64_align4			; GFX7-LABEL: name: load_local_s64_align4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_s64_align4			; GFX9-LABEL: name: load_local_s64_align4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_s64_align4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 4, addrspace 3)			%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_p3_from_4			name: load_local_p3_from_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p3_from_4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX7-LABEL: name: load_local_p3_from_4			; GFX7-LABEL: name: load_local_p3_from_4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)			; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]			; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX9-LABEL: name: load_local_p3_from_4			; GFX9-LABEL: name: load_local_p3_from_4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)			; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_p3_from_4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p3) = G_LOAD %0 :: (load 4, align 4, addrspace 3)			%1:vgpr(p3) = G_LOAD %0 :: (load 4, align 4, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_p5_from_4			name: load_local_p5_from_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p5_from_4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX7-LABEL: name: load_local_p5_from_4			; GFX7-LABEL: name: load_local_p5_from_4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)			; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]			; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX9-LABEL: name: load_local_p5_from_4			; GFX9-LABEL: name: load_local_p5_from_4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)			; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_p5_from_4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p3) = G_LOAD %0 :: (load 4, align 4, addrspace 3)			%1:vgpr(p3) = G_LOAD %0 :: (load 4, align 4, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_p1_align8			name: load_local_p1_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p1_align8
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX7-LABEL: name: load_local_p1_align8			; GFX7-LABEL: name: load_local_p1_align8
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)			; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX9-LABEL: name: load_local_p1_align8			; GFX9-LABEL: name: load_local_p1_align8
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)			; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]
				; GFX6-LABEL: name: load_local_p1_align8
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p1) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(p1) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_p1_align4			name: load_local_p1_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p1_align4
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p1) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p1)
	; GFX7-LABEL: name: load_local_p1_align4			; GFX7-LABEL: name: load_local_p1_align4
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_p1_align4			; GFX9-LABEL: name: load_local_p1_align4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_p1_align4
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p1) = G_LOAD [[COPY]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p1)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p1) = G_LOAD %0 :: (load 8, align 4, addrspace 3)			%1:vgpr(p1) = G_LOAD %0 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_p999_from_8			name: load_local_p999_from_8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_p999_from_8
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
	; GFX7-LABEL: name: load_local_p999_from_8			; GFX7-LABEL: name: load_local_p999_from_8
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)			; GFX7: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[LOAD]](p999)			; GFX7: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
	; GFX9-LABEL: name: load_local_p999_from_8			; GFX9-LABEL: name: load_local_p999_from_8
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX9: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)			; GFX9: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](p999)			; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
				; GFX6-LABEL: name: load_local_p999_from_8
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p999) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(p999) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_v2p3			name: load_local_v2p3
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v2p3
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
	; GFX7-LABEL: name: load_local_v2p3			; GFX7-LABEL: name: load_local_v2p3
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)			; GFX7: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)			; GFX7: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
	; GFX9-LABEL: name: load_local_v2p3			; GFX9-LABEL: name: load_local_v2p3
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX9: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)			; GFX9: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)			; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
				; GFX6-LABEL: name: load_local_v2p3
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p3) :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<2 x p3>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(<2 x p3>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	---			---

	name: load_local_v2s16			name: load_local_v2s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v2s16
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX7-LABEL: name: load_local_v2s16			; GFX7-LABEL: name: load_local_v2s16
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)			; GFX7: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]			; GFX7: $vgpr0 = COPY [[DS_READ_B32_]]
	; GFX9-LABEL: name: load_local_v2s16			; GFX9-LABEL: name: load_local_v2s16
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)			; GFX9: [[DS_READ_B32_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_B32_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 4, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_v2s16
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B32_:%[0-9]+]]:vgpr_32 = DS_READ_B32 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 4, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_B32_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<2 x s16>) = G_LOAD %0 :: (load 4, align 4, addrspace 3)			%1:vgpr(<2 x s16>) = G_LOAD %0 :: (load 4, align 4, addrspace 3)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_local_v4s16			name: load_local_v4s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_v4s16
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX7-LABEL: name: load_local_v4s16			; GFX7-LABEL: name: load_local_v4s16
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)			; GFX7: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	; GFX9-LABEL: name: load_local_v4s16			; GFX9-LABEL: name: load_local_v4s16
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)			; GFX9: [[DS_READ_B64_gfx9_:%[0-9]+]]:vreg_64 = DS_READ_B64_gfx9 [[COPY]], 0, 0, implicit $exec :: (load 8, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ_B64_gfx9_]]
				; GFX6-LABEL: name: load_local_v4s16
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_B64_:%[0-9]+]]:vreg_64 = DS_READ_B64 [[COPY]], 0, 0, implicit $m0, implicit $exec :: (load 8, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[DS_READ_B64_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(<4 x s16>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)			%1:vgpr(<4 x s16>) = G_LOAD %0 :: (load 8, align 8, addrspace 3)
	$vgpr0_vgpr1 = COPY %1			$vgpr0_vgpr1 = COPY %1

	...			...

	# ---			# ---

	# name: load_local_v6s16			# name: load_local_v6s16
	# legalized: true			# legalized: true
	# regBankSelected: true			# regBankSelected: true
	# tracksRegLiveness: true			# tracksRegLiveness: true
	# machineFunctionInfo:			# machineFunctionInfo:
	# scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			# scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	# scratchWaveOffsetReg: $sgpr4
	# stackPtrOffsetReg: $sgpr32			# stackPtrOffsetReg: $sgpr32

	# body: \|			# body: \|
	# bb.0:			# bb.0:
	# liveins: $vgpr0			# liveins: $vgpr0

	# %0:vgpr(p3) = COPY $vgpr0			# %0:vgpr(p3) = COPY $vgpr0
	# %1:vgpr(<6 x s16>) = G_LOAD %0 :: (load 12, align 4, addrspace 3)			# %1:vgpr(<6 x s16>) = G_LOAD %0 :: (load 12, align 4, addrspace 3)
	Show All 11 Lines
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1_gep_65535
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65535, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1_gep_65535			; GFX7-LABEL: name: load_local_s32_from_1_gep_65535
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[COPY]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1_gep_65535			; GFX9-LABEL: name: load_local_s32_from_1_gep_65535
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[COPY]], 65535, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[COPY]], 65535, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1_gep_65535
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65535, implicit $exec
				; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 65535			%1:vgpr(s32) = G_CONSTANT i32 65535
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_local_s32_from_1_gep_65535_known_bits_base_address			name: load_local_s32_from_1_gep_65535_known_bits_base_address
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX6: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[V_AND_B32_e64_]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address			; GFX7-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX7: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec			; GFX7: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[V_AND_B32_e64_]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[V_AND_B32_e64_]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address			; GFX9-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX9: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec			; GFX9: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_AND_B32_e64_]], 65535, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_AND_B32_e64_]], 65535, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1_gep_65535_known_bits_base_address
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
				; GFX6: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 [[V_AND_B32_e64_]], 65535, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 2147483647			%1:vgpr(s32) = G_CONSTANT i32 2147483647
	%2:vgpr(s32) = G_AND %0, %1			%2:vgpr(s32) = G_AND %0, %1
	%3:vgpr(p3) = G_INTTOPTR %2			%3:vgpr(p3) = G_INTTOPTR %2
	%4:vgpr(s32) = G_CONSTANT i32 65535			%4:vgpr(s32) = G_CONSTANT i32 65535
	%5:vgpr(p3) = G_PTR_ADD %3, %4			%5:vgpr(p3) = G_PTR_ADD %3, %4
	%6:vgpr(s32) = G_LOAD %5 :: (load 1, align 1, addrspace 3)			%6:vgpr(s32) = G_LOAD %5 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %6			$vgpr0 = COPY %6

	...			...

	---			---

	name: load_local_s32_from_1_gep_65536			name: load_local_s32_from_1_gep_65536
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1_gep_65536
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1_gep_65536			; GFX7-LABEL: name: load_local_s32_from_1_gep_65536
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec
	; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1_gep_65536			; GFX9-LABEL: name: load_local_s32_from_1_gep_65536
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_ADD_U32_e64_]], 0, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_ADD_U32_e64_]], 0, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1_gep_65536
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 65536, implicit $exec
				; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 65536			%1:vgpr(s32) = G_CONSTANT i32 65536
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_local_s32_from_1_gep_m1			name: load_local_s32_from_1_gep_m1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_local_s32_from_1_gep_m1
	; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX7-LABEL: name: load_local_s32_from_1_gep_m1			; GFX7-LABEL: name: load_local_s32_from_1_gep_m1
	; GFX7: liveins: $vgpr0			; GFX7: liveins: $vgpr0
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec
	; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)			; GFX7: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
	; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]			; GFX7: $vgpr0 = COPY [[DS_READ_U8_]]
	; GFX9-LABEL: name: load_local_s32_from_1_gep_m1			; GFX9-LABEL: name: load_local_s32_from_1_gep_m1
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_ADD_U32_e64_]], 0, 0, implicit $exec :: (load 1, addrspace 3)			; GFX9: [[DS_READ_U8_gfx9_:%[0-9]+]]:vgpr_32 = DS_READ_U8_gfx9 [[V_ADD_U32_e64_]], 0, 0, implicit $exec :: (load 1, addrspace 3)
	; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]			; GFX9: $vgpr0 = COPY [[DS_READ_U8_gfx9_]]
				; GFX6-LABEL: name: load_local_s32_from_1_gep_m1
				; GFX6: liveins: $vgpr0
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294967295, implicit $exec
				; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[DS_READ_U8_:%[0-9]+]]:vgpr_32 = DS_READ_U8 %2, 0, 0, implicit $m0, implicit $exec :: (load 1, addrspace 3)
				; GFX6: $vgpr0 = COPY [[DS_READ_U8_]]
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -1			%1:vgpr(s32) = G_CONSTANT i32 -1
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 3)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_local_s64_align4_from_1_gep_1016			name: load_local_s64_align4_from_1_gep_1016
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1			liveins: $vgpr0_vgpr1

	; GFX6-LABEL: name: load_local_s64_align4_from_1_gep_1016
	; GFX6: liveins: $vgpr0_vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1016
	; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY]], [[C]](s32)
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[PTR_ADD]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	; GFX7-LABEL: name: load_local_s64_align4_from_1_gep_1016			; GFX7-LABEL: name: load_local_s64_align4_from_1_gep_1016
	; GFX7: liveins: $vgpr0_vgpr1			; GFX7: liveins: $vgpr0_vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 254, 255, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 [[COPY]], 254, 255, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_s64_align4_from_1_gep_1016			; GFX9-LABEL: name: load_local_s64_align4_from_1_gep_1016
	; GFX9: liveins: $vgpr0_vgpr1			; GFX9: liveins: $vgpr0_vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 254, 255, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[COPY]], 254, 255, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_s64_align4_from_1_gep_1016
				; GFX6: liveins: $vgpr0_vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1016
				; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY]], [[C]](s32)
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[PTR_ADD]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 1016			%1:vgpr(s32) = G_CONSTANT i32 1016
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s64) = G_LOAD %2 :: (load 8, align 4, addrspace 3)			%3:vgpr(s64) = G_LOAD %2 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %3			$vgpr0_vgpr1 = COPY %3

	...			...

	---			---

	name: load_local_s64_align4_from_1_gep_1020			name: load_local_s64_align4_from_1_gep_1020
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1			liveins: $vgpr0_vgpr1

	; GFX6-LABEL: name: load_local_s64_align4_from_1_gep_1020
	; GFX6: liveins: $vgpr0_vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
	; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1020
	; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY]], [[C]](s32)
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[PTR_ADD]](p3) :: (load 8, align 4, addrspace 3)
	; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	; GFX7-LABEL: name: load_local_s64_align4_from_1_gep_1020			; GFX7-LABEL: name: load_local_s64_align4_from_1_gep_1020
	; GFX7: liveins: $vgpr0_vgpr1			; GFX7: liveins: $vgpr0_vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec
	; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX7: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 %2, 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX7: [[DS_READ2_B32_:%[0-9]+]]:vreg_64 = DS_READ2_B32 %2, 0, 1, 0, implicit $m0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]			; GFX7: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_]]
	; GFX9-LABEL: name: load_local_s64_align4_from_1_gep_1020			; GFX9-LABEL: name: load_local_s64_align4_from_1_gep_1020
	; GFX9: liveins: $vgpr0_vgpr1			; GFX9: liveins: $vgpr0_vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[V_ADD_U32_e64_]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)			; GFX9: [[DS_READ2_B32_gfx9_:%[0-9]+]]:vreg_64 = DS_READ2_B32_gfx9 [[V_ADD_U32_e64_]], 0, 1, 0, implicit $exec :: (load 8, align 4, addrspace 3)
	; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]			; GFX9: $vgpr0_vgpr1 = COPY [[DS_READ2_B32_gfx9_]]
				; GFX6-LABEL: name: load_local_s64_align4_from_1_gep_1020
				; GFX6: liveins: $vgpr0_vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
				; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1020
				; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY]], [[C]](s32)
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[PTR_ADD]](p3) :: (load 8, align 4, addrspace 3)
				; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 1020			%1:vgpr(s32) = G_CONSTANT i32 1020
	%2:vgpr(p3) = G_PTR_ADD %0, %1			%2:vgpr(p3) = G_PTR_ADD %0, %1
	%3:vgpr(s64) = G_LOAD %2 :: (load 8, align 4, addrspace 3)			%3:vgpr(s64) = G_LOAD %2 :: (load 8, align 4, addrspace 3)
	$vgpr0_vgpr1 = COPY %3			$vgpr0_vgpr1 = COPY %3

	...			...

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-private.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s			# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s
	# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s

	---			---

	name: load_private_s32_from_4			name: load_private_s32_from_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_4			; GFX6-LABEL: name: load_private_s32_from_4
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX6: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_4			; GFX9-LABEL: name: load_private_s32_from_4
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX9: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_2			name: load_private_s32_from_2
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_2			; GFX6-LABEL: name: load_private_s32_from_2
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[BUFFER_LOAD_USHORT_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_USHORT_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 2, addrspace 5)			; GFX6: [[BUFFER_LOAD_USHORT_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_USHORT_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 2, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_USHORT_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_USHORT_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_2			; GFX9-LABEL: name: load_private_s32_from_2
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_USHORT_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_USHORT_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 2, addrspace 5)			; GFX9: [[BUFFER_LOAD_USHORT_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_USHORT_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 2, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_USHORT_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_USHORT_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 2, align 2, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 2, align 2, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_1			name: load_private_s32_from_1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1			; GFX6-LABEL: name: load_private_s32_from_1
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1			; GFX9-LABEL: name: load_private_s32_from_1
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---
	Show All 26 Lines
	---			---

	name: load_private_p5_from_4			name: load_private_p5_from_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_p5_from_4			; GFX6-LABEL: name: load_private_p5_from_4
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	Show All 14 Lines
	---			---

	name: load_private_v2s16			name: load_private_v2s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_v2s16			; GFX6-LABEL: name: load_private_v2s16
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	Show All 18 Lines
	---			---

	name: load_private_s32_from_1_gep_2047			name: load_private_s32_from_1_gep_2047
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_2047			; GFX6-LABEL: name: load_private_s32_from_1_gep_2047
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2047, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2047, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_2047			; GFX9-LABEL: name: load_private_s32_from_1_gep_2047
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 2047			%1:vgpr(s32) = G_CONSTANT i32 2047
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_2047_known_bits			name: load_private_s32_from_1_gep_2047_known_bits
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_2047_known_bits			; GFX6-LABEL: name: load_private_s32_from_1_gep_2047_known_bits
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX6: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec			; GFX6: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_AND_B32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_AND_B32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_2047_known_bits			; GFX9-LABEL: name: load_private_s32_from_1_gep_2047_known_bits
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2147483647, implicit $exec
	; GFX9: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec			; GFX9: [[V_AND_B32_e64_:%[0-9]+]]:vgpr_32 = V_AND_B32_e64 [[COPY]], [[V_MOV_B32_e32_]], implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_AND_B32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_AND_B32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 2047, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 2147483647			%1:vgpr(s32) = G_CONSTANT i32 2147483647
	%2:vgpr(s32) = G_AND %0, %1			%2:vgpr(s32) = G_AND %0, %1
	%3:vgpr(p5) = G_INTTOPTR %2			%3:vgpr(p5) = G_INTTOPTR %2
	%4:vgpr(s32) = G_CONSTANT i32 2047			%4:vgpr(s32) = G_CONSTANT i32 2047
	%5:vgpr(p5) = G_PTR_ADD %3, %4			%5:vgpr(p5) = G_PTR_ADD %3, %4
	%6:vgpr(s32) = G_LOAD %5 :: (load 1, align 1, addrspace 5)			%6:vgpr(s32) = G_LOAD %5 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %6			$vgpr0 = COPY %6

	...			...

	---			---

	name: load_private_s32_from_1_gep_2048			name: load_private_s32_from_1_gep_2048
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_2048			; GFX6-LABEL: name: load_private_s32_from_1_gep_2048
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2048, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2048, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_2048			; GFX9-LABEL: name: load_private_s32_from_1_gep_2048
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 2048, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 2048, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 2048			%1:vgpr(s32) = G_CONSTANT i32 2048
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m2047			name: load_private_s32_from_1_gep_m2047
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m2047			; GFX6-LABEL: name: load_private_s32_from_1_gep_m2047
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965249, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965249, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m2047			; GFX9-LABEL: name: load_private_s32_from_1_gep_m2047
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965249, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965249, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -2047			%1:vgpr(s32) = G_CONSTANT i32 -2047
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m2048			name: load_private_s32_from_1_gep_m2048
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m2048			; GFX6-LABEL: name: load_private_s32_from_1_gep_m2048
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965248, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965248, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m2048			; GFX9-LABEL: name: load_private_s32_from_1_gep_m2048
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965248, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294965248, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -2048			%1:vgpr(s32) = G_CONSTANT i32 -2048
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_4095			name: load_private_s32_from_1_gep_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_4095			; GFX6-LABEL: name: load_private_s32_from_1_gep_4095
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_4095			; GFX9-LABEL: name: load_private_s32_from_1_gep_4095
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 4095			%1:vgpr(s32) = G_CONSTANT i32 4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_4096			name: load_private_s32_from_1_gep_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_4096			; GFX6-LABEL: name: load_private_s32_from_1_gep_4096
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_4096			; GFX9-LABEL: name: load_private_s32_from_1_gep_4096
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 4096			%1:vgpr(s32) = G_CONSTANT i32 4096
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m4095			name: load_private_s32_from_1_gep_m4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m4095			; GFX6-LABEL: name: load_private_s32_from_1_gep_m4095
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963201, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963201, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m4095			; GFX9-LABEL: name: load_private_s32_from_1_gep_m4095
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963201, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963201, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -4095			%1:vgpr(s32) = G_CONSTANT i32 -4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m4096			name: load_private_s32_from_1_gep_m4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m4096			; GFX6-LABEL: name: load_private_s32_from_1_gep_m4096
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963200, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963200, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m4096			; GFX9-LABEL: name: load_private_s32_from_1_gep_m4096
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963200, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294963200, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -4096			%1:vgpr(s32) = G_CONSTANT i32 -4096
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_8191			name: load_private_s32_from_1_gep_8191
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_8191			; GFX6-LABEL: name: load_private_s32_from_1_gep_8191
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8191, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8191, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_8191			; GFX9-LABEL: name: load_private_s32_from_1_gep_8191
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8191, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8191, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 8191			%1:vgpr(s32) = G_CONSTANT i32 8191
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_8192			name: load_private_s32_from_1_gep_8192
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_8192			; GFX6-LABEL: name: load_private_s32_from_1_gep_8192
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8192, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8192, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_8192			; GFX9-LABEL: name: load_private_s32_from_1_gep_8192
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8192, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 8192, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 8192			%1:vgpr(s32) = G_CONSTANT i32 8192
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m8191			name: load_private_s32_from_1_gep_m8191
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m8191			; GFX6-LABEL: name: load_private_s32_from_1_gep_m8191
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959105, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959105, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m8191			; GFX9-LABEL: name: load_private_s32_from_1_gep_m8191
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959105, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959105, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -8191			%1:vgpr(s32) = G_CONSTANT i32 -8191
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_gep_m8192			name: load_private_s32_from_1_gep_m8192
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; GFX6-LABEL: name: load_private_s32_from_1_gep_m8192			; GFX6-LABEL: name: load_private_s32_from_1_gep_m8192
	; GFX6: liveins: $vgpr0			; GFX6: liveins: $vgpr0
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959104, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959104, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_gep_m8192			; GFX9-LABEL: name: load_private_s32_from_1_gep_m8192
	; GFX9: liveins: $vgpr0			; GFX9: liveins: $vgpr0
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959104, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4294959104, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(s32) = G_CONSTANT i32 -8192			%1:vgpr(s32) = G_CONSTANT i32 -8192
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_4_constant_0			name: load_private_s32_from_4_constant_0
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_4_constant_0			; GFX6-LABEL: name: load_private_s32_from_4_constant_0
	; GFX6: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX6: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
	; GFX9-LABEL: name: load_private_s32_from_4_constant_0			; GFX9-LABEL: name: load_private_s32_from_4_constant_0
	; GFX9: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX9: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
	%0:vgpr(p5) = G_CONSTANT i32 0			%0:vgpr(p5) = G_CONSTANT i32 0
	%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_4_constant_sgpr_16			name: load_private_s32_from_4_constant_sgpr_16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_4_constant_sgpr_16			; GFX6-LABEL: name: load_private_s32_from_4_constant_sgpr_16
	; GFX6: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 16, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX6: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 16, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
	; GFX9-LABEL: name: load_private_s32_from_4_constant_sgpr_16			; GFX9-LABEL: name: load_private_s32_from_4_constant_sgpr_16
	; GFX9: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 16, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			; GFX9: [[BUFFER_LOAD_DWORD_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 16, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFSET]]
	%0:sgpr(p5) = G_CONSTANT i32 16			%0:sgpr(p5) = G_CONSTANT i32 16
	%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_1_constant_4095			name: load_private_s32_from_1_constant_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_1_constant_4095			; GFX6-LABEL: name: load_private_s32_from_1_constant_4095
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFSET]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFSET]]
	; GFX9-LABEL: name: load_private_s32_from_1_constant_4095			; GFX9-LABEL: name: load_private_s32_from_1_constant_4095
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFSET:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFSET]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFSET]]
	%0:vgpr(p5) = G_CONSTANT i32 4095			%0:vgpr(p5) = G_CONSTANT i32 4095
	%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_1_constant_4096			name: load_private_s32_from_1_constant_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_1_constant_4096			; GFX6-LABEL: name: load_private_s32_from_1_constant_4096
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_constant_4096			; GFX9-LABEL: name: load_private_s32_from_1_constant_4096
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = G_CONSTANT i32 4096			%0:vgpr(p5) = G_CONSTANT i32 4096
	%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)			%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %1			$vgpr0 = COPY %1

	...			...

	---			---

	name: load_private_s32_from_fi			name: load_private_s32_from_fi
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4, alignment: 4 }			- { id: 0, size: 4, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_fi			; GFX6-LABEL: name: load_private_s32_from_fi
	Show All 11 Lines
	---			---

	name: load_private_s32_from_1_fi_offset_4095			name: load_private_s32_from_1_fi_offset_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4095			; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4095			; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4095
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = G_FRAME_INDEX %stack.0			%0:vgpr(p5) = G_FRAME_INDEX %stack.0
	%1:vgpr(s32) = G_CONSTANT i32 4095			%1:vgpr(s32) = G_CONSTANT i32 4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

	---			---

	name: load_private_s32_from_1_fi_offset_4096			name: load_private_s32_from_1_fi_offset_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 8192, alignment: 4 }			- { id: 0, size: 8192, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4096			; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4096
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
	; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4096			; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4096
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
	; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)			; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN [[V_ADD_U32_e64_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
	; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]			; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
	%0:vgpr(p5) = G_FRAME_INDEX %stack.0			%0:vgpr(p5) = G_FRAME_INDEX %stack.0
	%1:vgpr(s32) = G_CONSTANT i32 4096			%1:vgpr(s32) = G_CONSTANT i32 4096
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)			%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
	$vgpr0 = COPY %3			$vgpr0 = COPY %3

	...			...

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-local.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s			# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s
	# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX7 %s			# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX7 %s
	# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX7 %s			# RUN: llc -march=amdgcn -mcpu=fiji -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX7 %s
	# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s
	# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s

	---			---

	name: store_local_s32_to_4			name: store_local_s32_to_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_s32_to_4
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_4			; GFX7-LABEL: name: store_local_s32_to_4
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)			; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_4			; GFX9-LABEL: name: store_local_s32_to_4
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)			; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_4
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 3)			G_STORE %0, %1 :: (store 4, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_s32_to_2			name: store_local_s32_to_2
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_s32_to_2
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B16 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 2, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_2			; GFX7-LABEL: name: store_local_s32_to_2
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B16 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 2, addrspace 3)			; GFX7: DS_WRITE_B16 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 2, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_2			; GFX9-LABEL: name: store_local_s32_to_2
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B16_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 2, addrspace 3)			; GFX9: DS_WRITE_B16_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 2, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_2
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B16 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 2, addrspace 3)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 2, align 2, addrspace 3)			G_STORE %0, %1 :: (store 2, align 2, addrspace 3)

	...			...

	---			---

	name: store_local_s32_to_1			name: store_local_s32_to_1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_s32_to_1
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B8 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_1			; GFX7-LABEL: name: store_local_s32_to_1
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B8 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)			; GFX7: DS_WRITE_B8 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_1			; GFX9-LABEL: name: store_local_s32_to_1
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B8_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 1, addrspace 3)			; GFX9: DS_WRITE_B8_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 1, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_1
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B8 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 1, align 1, addrspace 3)			G_STORE %0, %1 :: (store 1, align 1, addrspace 3)

	...			...

	---			---

	name: store_local_v2s16			name: store_local_v2s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_v2s16
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX7-LABEL: name: store_local_v2s16			; GFX7-LABEL: name: store_local_v2s16
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)			; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX9-LABEL: name: store_local_v2s16			; GFX9-LABEL: name: store_local_v2s16
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)			; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)
				; GFX6-LABEL: name: store_local_v2s16
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	%0:vgpr(<2 x s16>) = COPY $vgpr0			%0:vgpr(<2 x s16>) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 3)			G_STORE %0, %1 :: (store 4, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_p3			name: store_local_p3
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_local_p3
	; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX7-LABEL: name: store_local_p3			; GFX7-LABEL: name: store_local_p3
	; GFX7: liveins: $vgpr0, $vgpr1			; GFX7: liveins: $vgpr0, $vgpr1
	; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX7: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)			; GFX7: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	; GFX9-LABEL: name: store_local_p3			; GFX9-LABEL: name: store_local_p3
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)			; GFX9: DS_WRITE_B32_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 4, addrspace 3)
				; GFX6-LABEL: name: store_local_p3
				; GFX6: liveins: $vgpr0, $vgpr1
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B32 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 4, addrspace 3)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p3) = COPY $vgpr1			%1:vgpr(p3) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 3)			G_STORE %0, %1 :: (store 4, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_s32_to_1_constant_4095			name: store_local_s32_to_1_constant_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_local_s32_to_1_constant_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_1_constant_4095			; GFX7-LABEL: name: store_local_s32_to_1_constant_4095
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX7: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX7: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)			; GFX7: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_1_constant_4095			; GFX9-LABEL: name: store_local_s32_to_1_constant_4095
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: DS_WRITE_B8_gfx9 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $exec :: (store 1, addrspace 3)			; GFX9: DS_WRITE_B8_gfx9 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $exec :: (store 1, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_1_constant_4095
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
				; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	%0:vgpr(p3) = G_CONSTANT i32 4095			%0:vgpr(p3) = G_CONSTANT i32 4095
	%1:vgpr(s32) = G_CONSTANT i32 0			%1:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %1, %0 :: (store 1, align 1, addrspace 3)			G_STORE %1, %0 :: (store 1, align 1, addrspace 3)

	...			...

	---			---

	name: store_local_s32_to_1_constant_4096			name: store_local_s32_to_1_constant_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_local_s32_to_1_constant_4096
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX7-LABEL: name: store_local_s32_to_1_constant_4096			; GFX7-LABEL: name: store_local_s32_to_1_constant_4096
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX7: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX7: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)			; GFX7: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	; GFX9-LABEL: name: store_local_s32_to_1_constant_4096			; GFX9-LABEL: name: store_local_s32_to_1_constant_4096
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: DS_WRITE_B8_gfx9 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $exec :: (store 1, addrspace 3)			; GFX9: DS_WRITE_B8_gfx9 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $exec :: (store 1, addrspace 3)
				; GFX6-LABEL: name: store_local_s32_to_1_constant_4096
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
				; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B8 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, 0, implicit $m0, implicit $exec :: (store 1, addrspace 3)
	%0:vgpr(p3) = G_CONSTANT i32 4096			%0:vgpr(p3) = G_CONSTANT i32 4096
	%1:vgpr(s32) = G_CONSTANT i32 0			%1:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %1, %0 :: (store 1, align 1, addrspace 3)			G_STORE %1, %0 :: (store 1, align 1, addrspace 3)

	...			...

	---			---

	name: store_local_s64_align4			name: store_local_s64_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_s64_align4
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](s64), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_s64_align4			; GFX7-LABEL: name: store_local_s64_align4
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_s64_align4			; GFX9-LABEL: name: store_local_s64_align4
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_s64_align4
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](s64), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(s64) = COPY $vgpr0_vgpr1			%0:vgpr(s64) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 4, addrspace 3)			G_STORE %0, %1 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_p1_align4			name: store_local_p1_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_p1_align4
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(p1) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](p1), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_p1_align4			; GFX7-LABEL: name: store_local_p1_align4
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_p1_align4			; GFX9-LABEL: name: store_local_p1_align4
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_p1_align4
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(p1) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](p1), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(p1) = COPY $vgpr0_vgpr1			%0:vgpr(p1) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 4, addrspace 3)			G_STORE %0, %1 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_v2s32_align4			name: store_local_v2s32_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_v2s32_align4
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](<2 x s32>), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_v2s32_align4			; GFX7-LABEL: name: store_local_v2s32_align4
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_v2s32_align4			; GFX9-LABEL: name: store_local_v2s32_align4
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_v2s32_align4
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](<2 x s32>), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1			%0:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 4, addrspace 3)			G_STORE %0, %1 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_v4s16_align4			name: store_local_v4s16_align4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_v4s16_align4
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](<4 x s16>), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_v4s16_align4			; GFX7-LABEL: name: store_local_v4s16_align4
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_v4s16_align4			; GFX9-LABEL: name: store_local_v4s16_align4
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_v4s16_align4
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](<4 x s16>), [[COPY1]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1			%0:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 4, addrspace 3)			G_STORE %0, %1 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_s64_align8			name: store_local_s64_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_s64_align8
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX7-LABEL: name: store_local_s64_align8			; GFX7-LABEL: name: store_local_s64_align8
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)			; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX9-LABEL: name: store_local_s64_align8			; GFX9-LABEL: name: store_local_s64_align8
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)			; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)
				; GFX6-LABEL: name: store_local_s64_align8
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	%0:vgpr(s64) = COPY $vgpr0_vgpr1			%0:vgpr(s64) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 8, addrspace 3)			G_STORE %0, %1 :: (store 8, align 8, addrspace 3)

	...			...

	---			---

	name: store_local_p1_align8			name: store_local_p1_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_p1_align8
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX7-LABEL: name: store_local_p1_align8			; GFX7-LABEL: name: store_local_p1_align8
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)			; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX9-LABEL: name: store_local_p1_align8			; GFX9-LABEL: name: store_local_p1_align8
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)			; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)
				; GFX6-LABEL: name: store_local_p1_align8
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	%0:vgpr(p1) = COPY $vgpr0_vgpr1			%0:vgpr(p1) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 8, addrspace 3)			G_STORE %0, %1 :: (store 8, align 8, addrspace 3)

	...			...

	---			---

	name: store_local_v2s32_align8			name: store_local_v2s32_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_v2s32_align8
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX7-LABEL: name: store_local_v2s32_align8			; GFX7-LABEL: name: store_local_v2s32_align8
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)			; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX9-LABEL: name: store_local_v2s32_align8			; GFX9-LABEL: name: store_local_v2s32_align8
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)			; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)
				; GFX6-LABEL: name: store_local_v2s32_align8
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	%0:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1			%0:vgpr(<2 x s32>) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 8, addrspace 3)			G_STORE %0, %1 :: (store 8, align 8, addrspace 3)

	...			...

	---			---

	name: store_local_v4s16_align8			name: store_local_v4s16_align8
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_v4s16_align8
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX7-LABEL: name: store_local_v4s16_align8			; GFX7-LABEL: name: store_local_v4s16_align8
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)			; GFX7: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	; GFX9-LABEL: name: store_local_v4s16_align8			; GFX9-LABEL: name: store_local_v4s16_align8
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)			; GFX9: DS_WRITE_B64_gfx9 [[COPY1]], [[COPY]], 0, 0, implicit $exec :: (store 8, addrspace 3)
				; GFX6-LABEL: name: store_local_v4s16_align8
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: DS_WRITE_B64 [[COPY1]], [[COPY]], 0, 0, implicit $m0, implicit $exec :: (store 8, addrspace 3)
	%0:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1			%0:vgpr(<4 x s16>) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	G_STORE %0, %1 :: (store 8, align 8, addrspace 3)			G_STORE %0, %1 :: (store 8, align 8, addrspace 3)

	...			...

	---			---

	name: store_local_s64_align4_from_1_gep_1016			name: store_local_s64_align4_from_1_gep_1016
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_s64_align4_from_1_gep_1016
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1016
	; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY1]], [[C]](s32)
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](s64), [[PTR_ADD]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_s64_align4_from_1_gep_1016			; GFX7-LABEL: name: store_local_s64_align4_from_1_gep_1016
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 254, 255, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 [[COPY1]], [[COPY3]], [[COPY2]], 254, 255, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_s64_align4_from_1_gep_1016			; GFX9-LABEL: name: store_local_s64_align4_from_1_gep_1016
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 254, 255, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[COPY1]], [[COPY3]], [[COPY2]], 254, 255, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_s64_align4_from_1_gep_1016
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1016
				; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY1]], [[C]](s32)
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](s64), [[PTR_ADD]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(s64) = COPY $vgpr0_vgpr1			%0:vgpr(s64) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	%2:vgpr(s32) = G_CONSTANT i32 1016			%2:vgpr(s32) = G_CONSTANT i32 1016
	%3:vgpr(p3) = G_PTR_ADD %1, %2			%3:vgpr(p3) = G_PTR_ADD %1, %2
	G_STORE %0, %3 :: (store 8, align 4, addrspace 3)			G_STORE %0, %3 :: (store 8, align 4, addrspace 3)

	...			...

	---			---

	name: store_local_s64_align4_from_1_gep_1020			name: store_local_s64_align4_from_1_gep_1020
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2			liveins: $vgpr0_vgpr1, $vgpr2

	; GFX6-LABEL: name: store_local_s64_align4_from_1_gep_1020
	; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
	; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
	; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1020
	; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY1]], [[C]](s32)
	; GFX6: $m0 = S_MOV_B32 -1
	; GFX6: G_STORE [[COPY]](s64), [[PTR_ADD]](p3) :: (store 8, align 4, addrspace 3)
	; GFX7-LABEL: name: store_local_s64_align4_from_1_gep_1020			; GFX7-LABEL: name: store_local_s64_align4_from_1_gep_1020
	; GFX7: liveins: $vgpr0_vgpr1, $vgpr2			; GFX7: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX7: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX7: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec			; GFX7: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec
	; GFX7: %3:vgpr_32, dead %6:sreg_64_xexec = V_ADD_I32_e64 [[COPY1]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX7: %3:vgpr_32, dead %6:sreg_64_xexec = V_ADD_I32_e64 [[COPY1]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX7: $m0 = S_MOV_B32 -1			; GFX7: $m0 = S_MOV_B32 -1
	; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX7: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX7: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX7: DS_WRITE2_B32 %3, [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX7: DS_WRITE2_B32 %3, [[COPY3]], [[COPY2]], 0, 1, 0, implicit $m0, implicit $exec :: (store 8, align 4, addrspace 3)
	; GFX9-LABEL: name: store_local_s64_align4_from_1_gep_1020			; GFX9-LABEL: name: store_local_s64_align4_from_1_gep_1020
	; GFX9: liveins: $vgpr0_vgpr1, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1, $vgpr2
	; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1			; GFX9: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr2
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1020, implicit $exec
	; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY1]], [[V_MOV_B32_e32_]], 0, implicit $exec			; GFX9: [[V_ADD_U32_e64_:%[0-9]+]]:vgpr_32 = V_ADD_U32_e64 [[COPY1]], [[V_MOV_B32_e32_]], 0, implicit $exec
	; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1			; GFX9: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
	; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0			; GFX9: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub0
	; GFX9: DS_WRITE2_B32_gfx9 [[V_ADD_U32_e64_]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)			; GFX9: DS_WRITE2_B32_gfx9 [[V_ADD_U32_e64_]], [[COPY3]], [[COPY2]], 0, 1, 0, implicit $exec :: (store 8, align 4, addrspace 3)
				; GFX6-LABEL: name: store_local_s64_align4_from_1_gep_1020
				; GFX6: liveins: $vgpr0_vgpr1, $vgpr2
				; GFX6: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GFX6: [[COPY1:%[0-9]+]]:vgpr(p3) = COPY $vgpr2
				; GFX6: [[C:%[0-9]+]]:vgpr(s32) = G_CONSTANT i32 1020
				; GFX6: [[PTR_ADD:%[0-9]+]]:vgpr(p3) = G_PTR_ADD [[COPY1]], [[C]](s32)
				; GFX6: $m0 = S_MOV_B32 -1
				; GFX6: G_STORE [[COPY]](s64), [[PTR_ADD]](p3) :: (store 8, align 4, addrspace 3)
	%0:vgpr(s64) = COPY $vgpr0_vgpr1			%0:vgpr(s64) = COPY $vgpr0_vgpr1
	%1:vgpr(p3) = COPY $vgpr2			%1:vgpr(p3) = COPY $vgpr2
	%2:vgpr(s32) = G_CONSTANT i32 1020			%2:vgpr(s32) = G_CONSTANT i32 1020
	%3:vgpr(p3) = G_PTR_ADD %1, %2			%3:vgpr(p3) = G_PTR_ADD %1, %2
	G_STORE %0, %3 :: (store 8, align 4, addrspace 3)			G_STORE %0, %3 :: (store 8, align 4, addrspace 3)

	...			...

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-private.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s			# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX6 %s
	# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=0 -o - %s \| FileCheck -check-prefix=GFX9 %s

	---			---

	name: store_private_s32_to_4			name: function_store_private_s32_to_4
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_s32_to_4			; GFX6-LABEL: name: function_store_private_s32_to_4
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_4			; GFX9-LABEL: name: function_store_private_s32_to_4
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 5)			G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_2			name: function_store_private_s32_to_2
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_s32_to_2			; GFX6-LABEL: name: function_store_private_s32_to_2
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)			; GFX6: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_2			; GFX9-LABEL: name: function_store_private_s32_to_2
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)			; GFX9: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 2, align 2, addrspace 5)			G_STORE %0, %1 :: (store 2, align 2, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_1			name: function_store_private_s32_to_1
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_s32_to_1			; GFX6-LABEL: name: function_store_private_s32_to_1
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_1			; GFX9-LABEL: name: function_store_private_s32_to_1
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 1, align 1, addrspace 5)			G_STORE %0, %1 :: (store 1, align 1, addrspace 5)

	...			...

	---			---

	name: store_private_v2s16			name: function_store_private_v2s16
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_v2s16			; GFX6-LABEL: name: function_store_private_v2s16
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	; GFX9-LABEL: name: store_private_v2s16			; GFX9-LABEL: name: function_store_private_v2s16
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	%0:vgpr(<2 x s16>) = COPY $vgpr0			%0:vgpr(<2 x s16>) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 5)			G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

	...			...

	---			---

	name: store_private_p3			name: function_store_private_p3
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_p3			; GFX6-LABEL: name: function_store_private_p3
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	; GFX9-LABEL: name: store_private_p3			; GFX9-LABEL: name: function_store_private_p3
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	%0:vgpr(p3) = COPY $vgpr0			%0:vgpr(p3) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 5)			G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

	...			...

	---			---

	name: store_private_p5			name: function_store_private_p5
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0, $vgpr1			liveins: $vgpr0, $vgpr1

	; GFX6-LABEL: name: store_private_p5			; GFX6-LABEL: name: function_store_private_p5
	; GFX6: liveins: $vgpr0, $vgpr1			; GFX6: liveins: $vgpr0, $vgpr1
	; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	; GFX9-LABEL: name: store_private_p5			; GFX9-LABEL: name: function_store_private_p5
	; GFX9: liveins: $vgpr0, $vgpr1			; GFX9: liveins: $vgpr0, $vgpr1
	; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1			; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
	; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
	%0:vgpr(p5) = COPY $vgpr0			%0:vgpr(p5) = COPY $vgpr0
	%1:vgpr(p5) = COPY $vgpr1			%1:vgpr(p5) = COPY $vgpr1
	G_STORE %0, %1 :: (store 4, align 4, addrspace 5)			G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_1_fi_offset_4095			name: function_store_private_s32_to_1_fi_offset_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_private_s32_to_1_fi_offset_4095			; GFX6-LABEL: name: function_store_private_s32_to_1_fi_offset_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec			; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
	; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec			; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX6: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_2]], %2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_2]], %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_1_fi_offset_4095			; GFX9-LABEL: name: function_store_private_s32_to_1_fi_offset_4095
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(p5) = G_FRAME_INDEX %stack.0			%0:vgpr(p5) = G_FRAME_INDEX %stack.0
	%1:vgpr(s32) = G_CONSTANT i32 4095			%1:vgpr(s32) = G_CONSTANT i32 4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_CONSTANT i32 0			%3:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %3, %2 :: (store 1, align 1, addrspace 5)			G_STORE %3, %2 :: (store 1, align 1, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_1_constant_4095			name: function_store_private_s32_to_1_constant_4095
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_private_s32_to_1_constant_4095			; GFX6-LABEL: name: function_store_private_s32_to_1_constant_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_1_constant_4095			; GFX9-LABEL: name: function_store_private_s32_to_1_constant_4095
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(p5) = G_CONSTANT i32 4095			%0:vgpr(p5) = G_CONSTANT i32 4095
	%1:vgpr(s32) = G_CONSTANT i32 0			%1:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %1, %0 :: (store 1, align 1, addrspace 5)			G_STORE %1, %0 :: (store 1, align 1, addrspace 5)

	...			...

	---			---

	name: store_private_s32_to_1_constant_4096			name: function_store_private_s32_to_1_constant_4096
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: store_private_s32_to_1_constant_4096			; GFX6-LABEL: name: function_store_private_s32_to_1_constant_4096
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: store_private_s32_to_1_constant_4096			; GFX9-LABEL: name: function_store_private_s32_to_1_constant_4096
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec			; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
	; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				%0:vgpr(p5) = G_CONSTANT i32 4096
				%1:vgpr(s32) = G_CONSTANT i32 0
				G_STORE %1, %0 :: (store 1, align 1, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_4
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_4
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_4
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				%0:vgpr(s32) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_2
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_2
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_2
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_SHORT_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 2, addrspace 5)
				%0:vgpr(s32) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 2, align 2, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_1
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_1
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_1
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_BYTE_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				%0:vgpr(s32) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 1, align 1, addrspace 5)

				...

				---

				name: kernel_store_private_v2s16
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_v2s16
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_v2s16
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				%0:vgpr(<2 x s16>) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

				...

				---

				name: kernel_store_private_p3
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_p3
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_p3
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				%0:vgpr(p3) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

				...

				---

				name: kernel_store_private_p5
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3

				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_p5
				; GFX6: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX6: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX6: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_p5
				; GFX9: liveins: $vgpr0, $vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr1
				; GFX9: BUFFER_STORE_DWORD_OFFEN [[COPY]], [[COPY1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4, addrspace 5)
				%0:vgpr(p5) = COPY $vgpr0
				%1:vgpr(p5) = COPY $vgpr1
				G_STORE %0, %1 :: (store 4, align 4, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_1_fi_offset_4095
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
				stack:
				- { id: 0, size: 4096, alignment: 4 }

				body: \|
				bb.0:
				liveins: $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_1_fi_offset_4095
				; GFX6: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4095, implicit $exec
				; GFX6: %2:vgpr_32, dead %4:sreg_64_xexec = V_ADD_I32_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], 0, implicit $exec
				; GFX6: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_2]], %2, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_1_fi_offset_4095
				; GFX9: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				%0:vgpr(p5) = G_FRAME_INDEX %stack.0
				%1:vgpr(s32) = G_CONSTANT i32 4095
				%2:vgpr(p5) = G_PTR_ADD %0, %1
				%3:vgpr(s32) = G_CONSTANT i32 0
				G_STORE %3, %2 :: (store 1, align 1, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_1_constant_4095
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
				stack:
				- { id: 0, size: 4096, alignment: 4 }

				body: \|
				bb.0:
				liveins: $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_1_constant_4095
				; GFX6: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_1_constant_4095
				; GFX9: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX9: BUFFER_STORE_BYTE_OFFSET [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				%0:vgpr(p5) = G_CONSTANT i32 4095
				%1:vgpr(s32) = G_CONSTANT i32 0
				G_STORE %1, %0 :: (store 1, align 1, addrspace 5)

				...

				---

				name: kernel_store_private_s32_to_1_constant_4096
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
				stack:
				- { id: 0, size: 4096, alignment: 4 }

				body: \|
				bb.0:
				liveins: $sgpr0_sgpr1_sgpr2_sgpr3

				; GFX6-LABEL: name: kernel_store_private_s32_to_1_constant_4096
				; GFX6: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX6: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
				; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
				; GFX9-LABEL: name: kernel_store_private_s32_to_1_constant_4096
				; GFX9: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
				; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
				; GFX9: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 4096, implicit $exec
				; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(p5) = G_CONSTANT i32 4096			%0:vgpr(p5) = G_CONSTANT i32 4096
	%1:vgpr(s32) = G_CONSTANT i32 0			%1:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %1, %0 :: (store 1, align 1, addrspace 5)			G_STORE %1, %0 :: (store 1, align 1, addrspace 5)

	...			...

llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll

	Show First 20 Lines • Show All 1,453 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_add_u32 s21, s21, s30			; GFX9-NEXT: s_add_u32 s21, s21, s30
	; GFX9-NEXT: s_cselect_b32 s24, 1, 0			; GFX9-NEXT: s_cselect_b32 s24, 1, 0
	; GFX9-NEXT: s_and_b32 s24, s24, 1			; GFX9-NEXT: s_and_b32 s24, s24, 1
	; GFX9-NEXT: s_mul_hi_u32 s31, s1, s11			; GFX9-NEXT: s_mul_hi_u32 s31, s1, s11
	; GFX9-NEXT: s_add_i32 s23, s23, s24			; GFX9-NEXT: s_add_i32 s23, s23, s24
	; GFX9-NEXT: s_add_u32 s21, s21, s31			; GFX9-NEXT: s_add_u32 s21, s21, s31
	; GFX9-NEXT: s_cselect_b32 s24, 1, 0			; GFX9-NEXT: s_cselect_b32 s24, 1, 0
	; GFX9-NEXT: s_and_b32 s24, s24, 1			; GFX9-NEXT: s_and_b32 s24, s24, 1
	; GFX9-NEXT: s_mul_hi_u32 s32, s0, s12			; GFX9-NEXT: s_mul_hi_u32 s33, s0, s12
	; GFX9-NEXT: s_add_i32 s23, s23, s24			; GFX9-NEXT: s_add_i32 s23, s23, s24
	; GFX9-NEXT: s_add_u32 s21, s21, s32			; GFX9-NEXT: s_add_u32 s21, s21, s33
	; GFX9-NEXT: s_cselect_b32 s24, 1, 0			; GFX9-NEXT: s_cselect_b32 s24, 1, 0
	; GFX9-NEXT: s_and_b32 s24, s24, 1			; GFX9-NEXT: s_and_b32 s24, s24, 1
	; GFX9-NEXT: s_add_i32 s23, s23, s24			; GFX9-NEXT: s_add_i32 s23, s23, s24
	; GFX9-NEXT: s_add_u32 s21, s21, s22			; GFX9-NEXT: s_add_u32 s21, s21, s22
	; GFX9-NEXT: s_cselect_b32 s22, 1, 0			; GFX9-NEXT: s_cselect_b32 s22, 1, 0
	; GFX9-NEXT: s_and_b32 s22, s22, 1			; GFX9-NEXT: s_and_b32 s22, s22, 1
	; GFX9-NEXT: s_add_i32 s23, s23, s22			; GFX9-NEXT: s_add_i32 s23, s23, s22
	; GFX9-NEXT: s_mul_i32 s22, s6, s8			; GFX9-NEXT: s_mul_i32 s22, s6, s8
	Show All 30 Lines
	; GFX9-NEXT: s_add_u32 s22, s22, s30			; GFX9-NEXT: s_add_u32 s22, s22, s30
	; GFX9-NEXT: s_cselect_b32 s25, 1, 0			; GFX9-NEXT: s_cselect_b32 s25, 1, 0
	; GFX9-NEXT: s_and_b32 s25, s25, 1			; GFX9-NEXT: s_and_b32 s25, s25, 1
	; GFX9-NEXT: s_mul_hi_u32 s31, s4, s9			; GFX9-NEXT: s_mul_hi_u32 s31, s4, s9
	; GFX9-NEXT: s_add_i32 s24, s24, s25			; GFX9-NEXT: s_add_i32 s24, s24, s25
	; GFX9-NEXT: s_add_u32 s22, s22, s31			; GFX9-NEXT: s_add_u32 s22, s22, s31
	; GFX9-NEXT: s_cselect_b32 s25, 1, 0			; GFX9-NEXT: s_cselect_b32 s25, 1, 0
	; GFX9-NEXT: s_and_b32 s25, s25, 1			; GFX9-NEXT: s_and_b32 s25, s25, 1
	; GFX9-NEXT: s_mul_hi_u32 s32, s3, s10			; GFX9-NEXT: s_mul_hi_u32 s33, s3, s10
	; GFX9-NEXT: s_add_i32 s24, s24, s25
	; GFX9-NEXT: s_add_u32 s22, s22, s32
	; GFX9-NEXT: s_cselect_b32 s25, 1, 0
	; GFX9-NEXT: s_and_b32 s25, s25, 1
	; GFX9-NEXT: s_mul_hi_u32 s33, s2, s11
	; GFX9-NEXT: s_add_i32 s24, s24, s25			; GFX9-NEXT: s_add_i32 s24, s24, s25
	; GFX9-NEXT: s_add_u32 s22, s22, s33			; GFX9-NEXT: s_add_u32 s22, s22, s33
	; GFX9-NEXT: s_cselect_b32 s25, 1, 0			; GFX9-NEXT: s_cselect_b32 s25, 1, 0
	; GFX9-NEXT: s_and_b32 s25, s25, 1			; GFX9-NEXT: s_and_b32 s25, s25, 1
	; GFX9-NEXT: s_mul_hi_u32 s34, s1, s12			; GFX9-NEXT: s_mul_hi_u32 s34, s2, s11
	; GFX9-NEXT: s_add_i32 s24, s24, s25			; GFX9-NEXT: s_add_i32 s24, s24, s25
	; GFX9-NEXT: s_add_u32 s22, s22, s34			; GFX9-NEXT: s_add_u32 s22, s22, s34
	; GFX9-NEXT: s_cselect_b32 s25, 1, 0			; GFX9-NEXT: s_cselect_b32 s25, 1, 0
	; GFX9-NEXT: s_and_b32 s25, s25, 1			; GFX9-NEXT: s_and_b32 s25, s25, 1
	; GFX9-NEXT: s_mul_hi_u32 s35, s0, s13			; GFX9-NEXT: s_mul_hi_u32 s35, s1, s12
	; GFX9-NEXT: s_add_i32 s24, s24, s25			; GFX9-NEXT: s_add_i32 s24, s24, s25
	; GFX9-NEXT: s_add_u32 s22, s22, s35			; GFX9-NEXT: s_add_u32 s22, s22, s35
	; GFX9-NEXT: s_cselect_b32 s25, 1, 0			; GFX9-NEXT: s_cselect_b32 s25, 1, 0
	; GFX9-NEXT: s_and_b32 s25, s25, 1			; GFX9-NEXT: s_and_b32 s25, s25, 1
				; GFX9-NEXT: s_mul_hi_u32 s36, s0, s13
				; GFX9-NEXT: s_add_i32 s24, s24, s25
				; GFX9-NEXT: s_add_u32 s22, s22, s36
				; GFX9-NEXT: s_cselect_b32 s25, 1, 0
				; GFX9-NEXT: s_and_b32 s25, s25, 1
	; GFX9-NEXT: s_add_i32 s24, s24, s25			; GFX9-NEXT: s_add_i32 s24, s24, s25
	; GFX9-NEXT: s_add_u32 s22, s22, s23			; GFX9-NEXT: s_add_u32 s22, s22, s23
	; GFX9-NEXT: s_cselect_b32 s23, 1, 0			; GFX9-NEXT: s_cselect_b32 s23, 1, 0
	; GFX9-NEXT: s_and_b32 s23, s23, 1			; GFX9-NEXT: s_and_b32 s23, s23, 1
	; GFX9-NEXT: s_add_i32 s24, s24, s23			; GFX9-NEXT: s_add_i32 s24, s24, s23
	; GFX9-NEXT: s_mul_i32 s23, s6, s9			; GFX9-NEXT: s_mul_i32 s23, s6, s9
	; GFX9-NEXT: s_mul_i32 s7, s7, s8			; GFX9-NEXT: s_mul_i32 s7, s7, s8
	; GFX9-NEXT: s_mul_i32 s25, s5, s10			; GFX9-NEXT: s_mul_i32 s25, s5, s10
	▲ Show 20 Lines • Show All 695 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/addrspacecast.ll

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	; HSA: enable_sgpr_dispatch_ptr = 0			; HSA: enable_sgpr_dispatch_ptr = 0
	; HSA: enable_sgpr_queue_ptr = 0			; HSA: enable_sgpr_queue_ptr = 0

	; HSA: s_load_dwordx2 s{{\[}}[[PTR_LO:[0-9]+]]:[[PTR_HI:[0-9]+]]{{\]}}			; HSA: s_load_dwordx2 s{{\[}}[[PTR_LO:[0-9]+]]:[[PTR_HI:[0-9]+]]{{\]}}
	; HSA-DAG: v_cmp_ne_u64_e64 vcc, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0{{$}}			; HSA-DAG: v_cmp_ne_u64_e64 vcc, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0{{$}}
	; HSA-DAG: v_mov_b32_e32 v[[VPTR_LO:[0-9]+]], s[[PTR_LO]]			; HSA-DAG: v_mov_b32_e32 v[[VPTR_LO:[0-9]+]], s[[PTR_LO]]
	; HSA-DAG: v_cndmask_b32_e32 [[CASTPTR:v[0-9]+]], 0, v[[VPTR_LO]]			; HSA-DAG: v_cndmask_b32_e32 [[CASTPTR:v[0-9]+]], 0, v[[VPTR_LO]]
	; HSA-DAG: v_mov_b32_e32 v[[K:[0-9]+]], 0{{$}}			; HSA-DAG: v_mov_b32_e32 v[[K:[0-9]+]], 0{{$}}
	; HSA: buffer_store_dword v[[K]], [[CASTPTR]], s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}			; HSA: buffer_store_dword v[[K]], [[CASTPTR]], s{{\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}
	define amdgpu_kernel void @use_flat_to_private_addrspacecast(i32* %ptr) #0 {			define amdgpu_kernel void @use_flat_to_private_addrspacecast(i32* %ptr) #0 {
	%ftos = addrspacecast i32* %ptr to i32 addrspace(5)*			%ftos = addrspacecast i32* %ptr to i32 addrspace(5)*
	store volatile i32 0, i32 addrspace(5)* %ftos			store volatile i32 0, i32 addrspace(5)* %ftos
	ret void			ret void
	}			}

	; HSA-LABEL: {{^}}use_flat_to_global_addrspacecast:			; HSA-LABEL: {{^}}use_flat_to_global_addrspacecast:
	; HSA: enable_sgpr_queue_ptr = 0			; HSA: enable_sgpr_queue_ptr = 0
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @cast_0_private_to_flat_addrspacecast() #0 {			define amdgpu_kernel void @cast_0_private_to_flat_addrspacecast() #0 {
	%cast = addrspacecast i32 addrspace(5)* null to i32*			%cast = addrspacecast i32 addrspace(5)* null to i32*
	store volatile i32 7, i32* %cast			store volatile i32 7, i32* %cast
	ret void			ret void
	}			}

	; HSA-LABEL: {{^}}cast_0_flat_to_private_addrspacecast:			; HSA-LABEL: {{^}}cast_0_flat_to_private_addrspacecast:
	; HSA: v_mov_b32_e32 [[K:v[0-9]+]], 7{{$}}			; HSA: v_mov_b32_e32 [[K:v[0-9]+]], 7{{$}}
	; HSA: buffer_store_dword [[K]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+$}}			; HSA: buffer_store_dword [[K]], off, s{{\[[0-9]+:[0-9]+\]}}, 0
	define amdgpu_kernel void @cast_0_flat_to_private_addrspacecast() #0 {			define amdgpu_kernel void @cast_0_flat_to_private_addrspacecast() #0 {
	%cast = addrspacecast i32* null to i32 addrspace(5)*			%cast = addrspacecast i32* null to i32 addrspace(5)*
	store volatile i32 7, i32 addrspace(5)* %cast			store volatile i32 7, i32 addrspace(5)* %cast
	ret void			ret void
	}			}

	; Disable optimizations in case there are optimizations added that			; Disable optimizations in case there are optimizations added that
	; specialize away generic pointer accesses.			; specialize away generic pointer accesses.
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

Show All 39 Lines
; by 4 bytes.		; by 4 bytes.
; HSA-ALLOCA: workitem_private_segment_byte_size = 24		; HSA-ALLOCA: workitem_private_segment_byte_size = 24
; HSA-ALLOCA: .end_amd_kernel_code_t		; HSA-ALLOCA: .end_amd_kernel_code_t

; HSA-ALLOCA: s_mov_b32 flat_scratch_lo, s7		; HSA-ALLOCA: s_mov_b32 flat_scratch_lo, s7
; HSA-ALLOCA: s_add_u32 s6, s6, s9		; HSA-ALLOCA: s_add_u32 s6, s6, s9
; HSA-ALLOCA: s_lshr_b32 flat_scratch_hi, s6, 8		; HSA-ALLOCA: s_lshr_b32 flat_scratch_hi, s6, 8

; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen ; encoding: [0x00,0x10,0x70,0xe0		; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen ; encoding: [0x00,0x10,0x70,0xe0
; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen ; encoding: [0x00,0x10,0x70,0xe0		; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen ; encoding: [0x00,0x10,0x70,0xe0


; HSAOPT: [[DISPATCH_PTR:%[0-9]+]] = call noalias nonnull dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()		; HSAOPT: [[DISPATCH_PTR:%[0-9]+]] = call noalias nonnull dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()
; HSAOPT: [[CAST_DISPATCH_PTR:%[0-9]+]] = bitcast i8 addrspace(4)* [[DISPATCH_PTR]] to i32 addrspace(4)*		; HSAOPT: [[CAST_DISPATCH_PTR:%[0-9]+]] = bitcast i8 addrspace(4)* [[DISPATCH_PTR]] to i32 addrspace(4)*
; HSAOPT: [[GEP0:%[0-9]+]] = getelementptr inbounds i32, i32 addrspace(4)* [[CAST_DISPATCH_PTR]], i64 1		; HSAOPT: [[GEP0:%[0-9]+]] = getelementptr inbounds i32, i32 addrspace(4)* [[CAST_DISPATCH_PTR]], i64 1
; HSAOPT: [[LDXY:%[0-9]+]] = load i32, i32 addrspace(4)* [[GEP0]], align 4, !invariant.load !0		; HSAOPT: [[LDXY:%[0-9]+]] = load i32, i32 addrspace(4)* [[GEP0]], align 4, !invariant.load !0
; HSAOPT: [[GEP1:%[0-9]+]] = getelementptr inbounds i32, i32 addrspace(4)* [[CAST_DISPATCH_PTR]], i64 2		; HSAOPT: [[GEP1:%[0-9]+]] = getelementptr inbounds i32, i32 addrspace(4)* [[CAST_DISPATCH_PTR]], i64 2
; HSAOPT: [[LDZU:%[0-9]+]] = load i32, i32 addrspace(4)* [[GEP1]], align 4, !range !1, !invariant.load !0		; HSAOPT: [[LDZU:%[0-9]+]] = load i32, i32 addrspace(4)* [[GEP1]], align 4, !range !1, !invariant.load !0
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	for.end:
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}short_array:		; FUNC-LABEL: {{^}}short_array:

; R600-VECT: MOVA_INT		; R600-VECT: MOVA_INT

; SI-ALLOCA-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:6 ; encoding: [0x06,0x00,0x68,0xe0		; SI-ALLOCA-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:6 ; encoding: [0x06,0x00,0x68,0xe0
; SI-ALLOCA-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ; encoding: [0x04,0x00,0x68,0xe0		; SI-ALLOCA-DAG: buffer_store_short v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ; encoding: [0x04,0x00,0x68,0xe0
; Loaded value is 0 or 1, so sext will become zext, so we get buffer_load_ushort instead of buffer_load_sshort.		; Loaded value is 0 or 1, so sext will become zext, so we get buffer_load_ushort instead of buffer_load_sshort.
; SI-ALLOCA: buffer_load_sshort v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}}		; SI-ALLOCA: buffer_load_sshort v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0

; SI-PROMOTE-VECT: s_load_dword [[IDX:s[0-9]+]]		; SI-PROMOTE-VECT: s_load_dword [[IDX:s[0-9]+]]
; SI-PROMOTE-VECT: s_mov_b32 [[SREG:s[0-9]+]], 0x10000		; SI-PROMOTE-VECT: s_mov_b32 [[SREG:s[0-9]+]], 0x10000
; SI-PROMOTE-VECT: s_lshl_b32 [[SCALED_IDX:s[0-9]+]], [[IDX]], 4		; SI-PROMOTE-VECT: s_lshl_b32 [[SCALED_IDX:s[0-9]+]], [[IDX]], 4
; SI-PROMOTE-VECT: v_mov_b32_e32 [[VREG:v[0-9]+]], [[SCALED_IDX]]		; SI-PROMOTE-VECT: v_mov_b32_e32 [[VREG:v[0-9]+]], [[SCALED_IDX]]
; SI-PROMOTE-VECT: v_bfe_u32 v{{[0-9]+}}, [[SREG]], [[VREG]], 16		; SI-PROMOTE-VECT: v_bfe_u32 v{{[0-9]+}}, [[SREG]], [[VREG]], 16
define amdgpu_kernel void @short_array(i32 addrspace(1)* %out, i32 %index) #0 {		define amdgpu_kernel void @short_array(i32 addrspace(1)* %out, i32 %index) #0 {
entry:		entry:
Show All 11 Lines

; FUNC-LABEL: {{^}}char_array:		; FUNC-LABEL: {{^}}char_array:

; R600-VECT: MOVA_INT		; R600-VECT: MOVA_INT

; SI-PROMOTE-VECT-DAG: s_lshl_b32		; SI-PROMOTE-VECT-DAG: s_lshl_b32
; SI-PROMOTE-VECT-DAG: v_lshrrev		; SI-PROMOTE-VECT-DAG: v_lshrrev

; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ; encoding: [0x04,0x00,0x60,0xe0		; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ; encoding: [0x04,0x00,0x60,0xe0
; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:5 ; encoding: [0x05,0x00,0x60,0xe0		; SI-ALLOCA-DAG: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:5 ; encoding: [0x05,0x00,0x60,0xe0
define amdgpu_kernel void @char_array(i32 addrspace(1)* %out, i32 %index) #0 {		define amdgpu_kernel void @char_array(i32 addrspace(1)* %out, i32 %index) #0 {
entry:		entry:
%0 = alloca [2 x i8], addrspace(5)		%0 = alloca [2 x i8], addrspace(5)
%1 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 0		%1 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 0
%2 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 1		%2 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 1
store i8 0, i8 addrspace(5)* %1		store i8 0, i8 addrspace(5)* %1
store i8 1, i8 addrspace(5)* %2		store i8 1, i8 addrspace(5)* %2
%3 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 %index		%3 = getelementptr inbounds [2 x i8], [2 x i8] addrspace(5)* %0, i32 0, i32 %index
%4 = load i8, i8 addrspace(5)* %3		%4 = load i8, i8 addrspace(5)* %3
%5 = sext i8 %4 to i32		%5 = sext i8 %4 to i32
store i32 %5, i32 addrspace(1)* %out		store i32 %5, i32 addrspace(1)* %out
ret void		ret void
}		}

; Test that two stack objects are not stored in the same register		; Test that two stack objects are not stored in the same register
; The second stack object should be in T3.X		; The second stack object should be in T3.X
; FUNC-LABEL: {{^}}no_overlap:		; FUNC-LABEL: {{^}}no_overlap:
; R600-CHECK: MOV		; R600-CHECK: MOV
; R600-CHECK: [[CHAN:[XYZW]]]+		; R600-CHECK: [[CHAN:[XYZW]]]+
; R600-NOT: [[CHAN]]+		; R600-NOT: [[CHAN]]+
;		;
; A total of 5 bytes should be allocated and used.		; A total of 5 bytes should be allocated and used.
; SI: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4 ;		; SI: buffer_store_byte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4 ;
define amdgpu_kernel void @no_overlap(i32 addrspace(1)* %out, i32 %in) #0 {		define amdgpu_kernel void @no_overlap(i32 addrspace(1)* %out, i32 %in) #0 {
entry:		entry:
%0 = alloca [3 x i8], align 1, addrspace(5)		%0 = alloca [3 x i8], align 1, addrspace(5)
%1 = alloca [2 x i8], align 1, addrspace(5)		%1 = alloca [2 x i8], align 1, addrspace(5)
%2 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 0		%2 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 0
%3 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 1		%3 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 1
%4 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 2		%4 = getelementptr [3 x i8], [3 x i8] addrspace(5)* %0, i32 0, i32 2
%5 = getelementptr [2 x i8], [2 x i8] addrspace(5)* %1, i32 0, i32 0		%5 = getelementptr [2 x i8], [2 x i8] addrspace(5)* %1, i32 0, i32 0
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; AMDGPUPromoteAlloca does not know how to handle ptrtoint. When it		; AMDGPUPromoteAlloca does not know how to handle ptrtoint. When it
; finds one, it should stop trying to promote.		; finds one, it should stop trying to promote.

; FUNC-LABEL: ptrtoint:		; FUNC-LABEL: ptrtoint:
; SI-NOT: ds_write		; SI-NOT: ds_write
; SI: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen		; SI: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen
; SI: v_add_{{[iu]}}32_e32 [[ADD_OFFSET:v[0-9]+]], vcc, 5,		; SI: v_add_{{[iu]}}32_e32 [[ADD_OFFSET:v[0-9]+]], vcc, 5,
; SI: buffer_load_dword v{{[0-9]+}}, [[ADD_OFFSET:v[0-9]+]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen ;		; SI: buffer_load_dword v{{[0-9]+}}, [[ADD_OFFSET:v[0-9]+]], s[{{[0-9]+:[0-9]+}}], 0 offen ;
define amdgpu_kernel void @ptrtoint(i32 addrspace(1)* %out, i32 %a, i32 %b) #0 {		define amdgpu_kernel void @ptrtoint(i32 addrspace(1)* %out, i32 %a, i32 %b) #0 {
%alloca = alloca [16 x i32], addrspace(5)		%alloca = alloca [16 x i32], addrspace(5)
%tmp0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %a		%tmp0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %a
store i32 5, i32 addrspace(5)* %tmp0		store i32 5, i32 addrspace(5)* %tmp0
%tmp1 = ptrtoint [16 x i32] addrspace(5)* %alloca to i32		%tmp1 = ptrtoint [16 x i32] addrspace(5)* %alloca to i32
%tmp2 = add i32 %tmp1, 5		%tmp2 = add i32 %tmp1, 5
%tmp3 = inttoptr i32 %tmp2 to i32 addrspace(5)*		%tmp3 = inttoptr i32 %tmp2 to i32 addrspace(5)*
%tmp4 = getelementptr i32, i32 addrspace(5)* %tmp3, i32 %b		%tmp4 = getelementptr i32, i32 addrspace(5)* %tmp3, i32 %b
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+trap-handler < %s \| FileCheck %s --check-prefixes=GCN,TRAP-HANDLER-ENABLE			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+trap-handler < %s \| FileCheck %s --check-prefixes=GCN,TRAP-HANDLER-ENABLE
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=-trap-handler < %s \| FileCheck %s --check-prefixes=GCN,TRAP-HANDLER-DISABLE			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=-trap-handler < %s \| FileCheck %s --check-prefixes=GCN,TRAP-HANDLER-DISABLE

	; GCN-LABEL: {{^}}amdhsa_trap_num_sgprs			; GCN-LABEL: {{^}}amdhsa_trap_num_sgprs
	; TRAP-HANDLER-ENABLE: NumSgprs: 60			; TRAP-HANDLER-ENABLE: NumSgprs: 61
	; TRAP-HANDLER-DISABLE: NumSgprs: 78			; TRAP-HANDLER-DISABLE: NumSgprs: 79
	define amdgpu_kernel void @amdhsa_trap_num_sgprs(			define amdgpu_kernel void @amdhsa_trap_num_sgprs(
	i32 addrspace(1)* %out0, i32 %in0,			i32 addrspace(1)* %out0, i32 %in0,
	i32 addrspace(1)* %out1, i32 %in1,			i32 addrspace(1)* %out1, i32 %in1,
	i32 addrspace(1)* %out2, i32 %in2,			i32 addrspace(1)* %out2, i32 %in2,
	i32 addrspace(1)* %out3, i32 %in3,			i32 addrspace(1)* %out3, i32 %in3,
	i32 addrspace(1)* %out4, i32 %in4,			i32 addrspace(1)* %out4, i32 %in4,
	i32 addrspace(1)* %out5, i32 %in5,			i32 addrspace(1)* %out5, i32 %in5,
	i32 addrspace(1)* %out6, i32 %in6,			i32 addrspace(1)* %out6, i32 %in6,
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/array-ptr-calc-i32.ll

	; RUN: llc -verify-machineinstrs -march=amdgcn -mcpu=tahiti -mattr=-promote-alloca < %s \| FileCheck -check-prefix=SI-ALLOCA -check-prefix=SI %s			; RUN: llc -verify-machineinstrs -march=amdgcn -mcpu=tahiti -mattr=-promote-alloca < %s \| FileCheck -check-prefix=SI-ALLOCA -check-prefix=SI %s
	; RUN: llc -verify-machineinstrs -march=amdgcn -mcpu=tahiti -mattr=+promote-alloca < %s \| FileCheck -check-prefix=SI-PROMOTE -check-prefix=SI %s			; RUN: llc -verify-machineinstrs -march=amdgcn -mcpu=tahiti -mattr=+promote-alloca < %s \| FileCheck -check-prefix=SI-PROMOTE -check-prefix=SI %s

	declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #1			declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #1
	declare i32 @llvm.amdgcn.mbcnt.hi(i32, i32) #1			declare i32 @llvm.amdgcn.mbcnt.hi(i32, i32) #1
	declare void @llvm.amdgcn.s.barrier() #2			declare void @llvm.amdgcn.s.barrier() #2

	; The required pointer calculations for the alloca'd actually requires			; The required pointer calculations for the alloca'd actually requires
	; an add and won't be folded into the addressing, which fails with a			; an add and won't be folded into the addressing, which fails with a
	; 64-bit pointer add. This should work since private pointers should			; 64-bit pointer add. This should work since private pointers should
	; be 32-bits.			; be 32-bits.

	; SI-LABEL: {{^}}test_private_array_ptr_calc:			; SI-LABEL: {{^}}test_private_array_ptr_calc:

	; SI-ALLOCA: v_add_i32_e32 [[PTRREG:v[0-9]+]], vcc, 16, v{{[0-9]+}}			; SI-ALLOCA: v_add_i32_e32 [[PTRREG:v[0-9]+]], vcc, 16, v{{[0-9]+}}
	; SI-ALLOCA: buffer_store_dword {{v[0-9]+}}, [[PTRREG]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen offset:64			; SI-ALLOCA: buffer_store_dword {{v[0-9]+}}, [[PTRREG]], s[{{[0-9]+:[0-9]+}}], 0 offen offset:64
	; SI-ALLOCA: s_barrier			; SI-ALLOCA: s_barrier
	; SI-ALLOCA: buffer_load_dword {{v[0-9]+}}, [[PTRREG]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen offset:64			; SI-ALLOCA: buffer_load_dword {{v[0-9]+}}, [[PTRREG]], s[{{[0-9]+:[0-9]+}}], 0 offen offset:64
	;			;
	; FIXME: The AMDGPUPromoteAlloca pass should be able to convert this			; FIXME: The AMDGPUPromoteAlloca pass should be able to convert this
	; alloca to a vector. It currently fails because it does not know how			; alloca to a vector. It currently fails because it does not know how
	; to interpret:			; to interpret:
	; getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 1, i32 %b			; getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 1, i32 %b

	; SI-PROMOTE: v_add_i32_e32 [[PTRREG:v[0-9]+]], vcc, 64			; SI-PROMOTE: v_add_i32_e32 [[PTRREG:v[0-9]+]], vcc, 64
	; SI-PROMOTE: ds_write_b32 [[PTRREG]]			; SI-PROMOTE: ds_write_b32 [[PTRREG]]
	Show All 24 Lines

llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=TOSGPR -check-prefix=ALL %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=TOSGPR -check-prefix=ALL %s

	; FIXME: Vectorization can increase required SGPR count beyond limit.			; FIXME: Vectorization can increase required SGPR count beyond limit.

	; ALL-LABEL: {{^}}max_9_sgprs:			; ALL-LABEL: {{^}}max_10_sgprs:

	; ALL: SGPRBlocks: 1			; ALL: SGPRBlocks: 1
	; ALL: NumSGPRsForWavesPerEU: 9			; ALL: NumSGPRsForWavesPerEU: 10
				scott.linderAuthorUnsubmitted Done Reply Inline Actions Can anyone help me understand what we are trying to test here? It seems likely the amount of live SGPRs and the amount of available SGPRs needs to be adjusted to have this test continue to be meaningful, but in trying to correct it I realized I wasn't sure what it was testing in the first place. scott.linder: Can anyone help me understand what we are trying to test here? It seems likely the amount of…
	define amdgpu_kernel void @max_9_sgprs() #0 {			define amdgpu_kernel void @max_10_sgprs() #0 {
	%one = load volatile i32, i32 addrspace(4)* undef			%one = load volatile i32, i32 addrspace(4)* undef
	%two = load volatile i32, i32 addrspace(4)* undef			%two = load volatile i32, i32 addrspace(4)* undef
	%three = load volatile i32, i32 addrspace(4)* undef			%three = load volatile i32, i32 addrspace(4)* undef
	%four = load volatile i32, i32 addrspace(4)* undef			%four = load volatile i32, i32 addrspace(4)* undef
	%five = load volatile i32, i32 addrspace(4)* undef			%five = load volatile i32, i32 addrspace(4)* undef
	%six = load volatile i32, i32 addrspace(4)* undef			%six = load volatile i32, i32 addrspace(4)* undef
	%seven = load volatile i32, i32 addrspace(4)* undef			%seven = load volatile i32, i32 addrspace(4)* undef
	%eight = load volatile i32, i32 addrspace(4)* undef			%eight = load volatile i32, i32 addrspace(4)* undef
	%nine = load volatile i32, i32 addrspace(4)* undef			%nine = load volatile i32, i32 addrspace(4)* undef
	%ten = load volatile i32, i32 addrspace(4)* undef			%ten = load volatile i32, i32 addrspace(4)* undef
	call void asm sideeffect "", "s,s,s,s,s,s,s,s,s"(i32 %one, i32 %two, i32 %three, i32 %four, i32 %five, i32 %six, i32 %seven, i32 %eight, i32 %nine)			%eleven = load volatile i32, i32 addrspace(4)* undef
				call void asm sideeffect "", "s,s,s,s,s,s,s,s,s,s"(i32 %one, i32 %two, i32 %three, i32 %four, i32 %five, i32 %six, i32 %seven, i32 %eight, i32 %nine, i32 %ten)
	store volatile i32 %one, i32 addrspace(1)* undef			store volatile i32 %one, i32 addrspace(1)* undef
	store volatile i32 %two, i32 addrspace(1)* undef			store volatile i32 %two, i32 addrspace(1)* undef
	store volatile i32 %three, i32 addrspace(1)* undef			store volatile i32 %three, i32 addrspace(1)* undef
	store volatile i32 %four, i32 addrspace(1)* undef			store volatile i32 %four, i32 addrspace(1)* undef
	store volatile i32 %five, i32 addrspace(1)* undef			store volatile i32 %five, i32 addrspace(1)* undef
	store volatile i32 %six, i32 addrspace(1)* undef			store volatile i32 %six, i32 addrspace(1)* undef
	store volatile i32 %seven, i32 addrspace(1)* undef			store volatile i32 %seven, i32 addrspace(1)* undef
	store volatile i32 %eight, i32 addrspace(1)* undef			store volatile i32 %eight, i32 addrspace(1)* undef
	store volatile i32 %nine, i32 addrspace(1)* undef			store volatile i32 %nine, i32 addrspace(1)* undef
	store volatile i32 %ten, i32 addrspace(1)* undef			store volatile i32 %ten, i32 addrspace(1)* undef
				store volatile i32 %eleven, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; private resource: 4			; private resource: 4
	; scratch wave offset: 1			; scratch wave offset: 1
	; workgroup ids: 3			; workgroup ids: 3
	; dispatch id: 2			; dispatch id: 2
	; queue ptr: 2			; queue ptr: 2
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s

%struct.ByValStruct = type { [4 x i32] }		%struct.ByValStruct = type { [4 x i32] }

; GCN-LABEL: {{^}}void_func_byval_struct:
; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s32{{$}}
; GCN-NOT: s32
; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-NOT: s32

; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s32 offset:16{{$}}
; GCN-NOT: s32
; GCN: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:16{{$}}
; GCN-NOT: s32
define hidden void @void_func_byval_struct(%struct.ByValStruct addrspace(5)* byval noalias nocapture align 4 %arg0, %struct.ByValStruct addrspace(5)* byval noalias nocapture align 4 %arg1) #1 {
entry:
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
%tmp = load volatile i32, i32 addrspace(5)* %arrayidx, align 4
%add = add nsw i32 %tmp, 1
store volatile i32 %add, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
%tmp1 = load volatile i32, i32 addrspace(5)* %arrayidx2, align 4
%add3 = add nsw i32 %tmp1, 2
store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 4
store volatile i32 9, i32 addrspace(1)* null, align 4
ret void
}

; Make sure the offset is folded and function's frame register is used		; Make sure the offset is folded and function's frame register is used
; rather than the global scratch wave offset.		; rather than the global scratch wave offset.
; GCN-LABEL: {{^}}void_func_byval_struct_use_outside_entry_block:		; GCN-LABEL: {{^}}void_func_byval_struct_use_outside_entry_block:
; GCN-NOT: v_lshrrev_b32		; GCN-NOT: v_lshrrev_b32
; GCN-NOT: s_sub_u32		; GCN-NOT: s_sub_u32

; GCN: s_and_saveexec_b64		; GCN: s_and_saveexec_b64
; GCN: s_cbranch_execz [[BB1:BB[0-9]+_[0-9]+]]		; GCN: s_cbranch_execz [[BB1:BB[0-9]+_[0-9]+]]
Show All 24 Lines	bb0:
%add3 = add nsw i32 %tmp1, 2		%add3 = add nsw i32 %tmp1, 2
store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 4		store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 4
store volatile i32 9, i32 addrspace(1)* null, align 4		store volatile i32 9, i32 addrspace(1)* null, align 4
br label %bb1		br label %bb1

bb1:		bb1:
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_byval_struct_non_leaf:
; GCN: buffer_store_dword v33, off, s[0:3], s32 offset:36
; GCN-DAG: v_writelane_b32 v33, s34,
; GCN: s_mov_b32 s34, s32
; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s34{{$}}
; GCN-DAG: s_add_u32 s32, s32, 0xc00{{$}}
; GCN-DAG: buffer_store_dword v32, off, s[0:3], s34 offset:32
; GCN-NOT: v_writelane_b32 v{{[0-9]+}}, s32

; GCN-DAG: v_add_{{[iu]}}32_e32 [[ADD0:v[0-9]+]], vcc, 1, [[LOAD0]]
; GCN: buffer_store_dword [[ADD0]], off, s[0:3], s34{{$}}

; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s34 offset:16{{$}}
; GCN-DAG: v_add_{{[iu]}}32_e32 [[ADD1:v[0-9]+]], vcc, 2, [[LOAD1]]

; GCN: s_swappc_b64

; GCN: buffer_store_dword [[ADD1]], off, s[0:3], s34 offset:16{{$}}

; GCN: v_readlane_b32
; GCN-NOT: v_readlane_b32 s32
; GCN-DAG: buffer_load_dword v32, off, s[0:3], s34 offset:32
; GCN: s_sub_u32 s32, s32, 0xc00{{$}}
; GCN: v_readlane_b32 s34, v33,
; GCN-DAG: buffer_load_dword v33, off, s[0:3], s32 offset:36 ; 4-byte Folded Reload
; GCN: s_setpc_b64
define void @void_func_byval_struct_non_leaf(%struct.ByValStruct addrspace(5)* byval noalias nocapture align 4 %arg0, %struct.ByValStruct addrspace(5)* byval noalias nocapture align 4 %arg1) #1 {
entry:
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
%tmp = load volatile i32, i32 addrspace(5)* %arrayidx, align 4
%add = add nsw i32 %tmp, 1
store volatile i32 %add, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
%tmp1 = load volatile i32, i32 addrspace(5)* %arrayidx2, align 4
%add3 = add nsw i32 %tmp1, 2
call void @external_void_func_void()
store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 4
store volatile i32 9, i32 addrspace(1)* null, align 4
ret void
}

; GCN-LABEL: {{^}}call_void_func_byval_struct_func:
; GCN: s_mov_b32 s34, s32
; GCN-DAG: s_add_u32 s32, s32, 0xc00{{$}}
; GCN-DAG: v_writelane_b32

; GCN-DAG: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN-DAG: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13

; GCN-DAG: buffer_store_dword [[NINE]], off, s[0:3], s34{{$}}
; GCN-DAG: buffer_store_dword [[THIRTEEN]], off, s[0:3], s34 offset:16

; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s34{{$}}
; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s34 offset:4
; GCN-DAG: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s34 offset:8
; GCN-DAG: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s34 offset:12

; GCN-NOT: s_add_u32 s32, s32, 0x800


; GCN-DAG: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-DAG: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
; GCN-DAG: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
; GCN-DAG: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12

; GCN: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s34 offset:16
; GCN: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s34 offset:20
; GCN: buffer_load_dword [[LOAD6:v[0-9]+]], off, s[0:3], s34 offset:24
; GCN: buffer_load_dword [[LOAD7:v[0-9]+]], off, s[0:3], s34 offset:28

; GCN-DAG: buffer_store_dword [[LOAD4]], off, s[0:3], s32 offset:16
; GCN-DAG: buffer_store_dword [[LOAD5]], off, s[0:3], s32 offset:20
; GCN-DAG: buffer_store_dword [[LOAD6]], off, s[0:3], s32 offset:24
; GCN-DAG: buffer_store_dword [[LOAD7]], off, s[0:3], s32 offset:28

; GCN: s_swappc_b64
; GCN-NOT: v_readlane_b32 s32
; GCN: v_readlane_b32
; GCN-NOT: v_readlane_b32 s32

; GCN-NOT: s_sub_u32 s32, s32, 0x800

; GCN: s_sub_u32 s32, s32, 0xc00{{$}}
; GCN: v_readlane_b32 s34, v
; GCN: s_waitcnt
; GCN: s_setpc_b64
define void @call_void_func_byval_struct_func() #1 {
entry:
%arg0 = alloca %struct.ByValStruct, align 4, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 4, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 4
call void @void_func_byval_struct(%struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

; GCN-LABEL: {{^}}call_void_func_byval_struct_kernel:
; GCN: s_mov_b32 s33, s7
; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN: buffer_store_dword [[NINE]], off, s[0:3], s33 offset:8
; GCN: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13
; GCN: buffer_store_dword [[THIRTEEN]], off, s[0:3], s33 offset:24

; GCN-NOT: s_add_u32 s32, s32, 0x800
; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s33 offset:8
; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s33 offset:12
; GCN-DAG: s_add_u32 s32, s33, 0xc00{{$}}
; GCN-DAG: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s33 offset:16
; GCN-DAG: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s33 offset:20

; GCN: s_getpc_b64

; GCN-DAG: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-DAG: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
; GCN-DAG: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
; GCN-DAG: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12

; GCN-DAG: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s33 offset:24
; GCN-DAG: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s33 offset:28
; GCN-DAG: buffer_load_dword [[LOAD6:v[0-9]+]], off, s[0:3], s33 offset:32
; GCN-DAG: buffer_load_dword [[LOAD7:v[0-9]+]], off, s[0:3], s33 offset:36

; GCN-DAG: buffer_store_dword [[LOAD4]], off, s[0:3], s32 offset:16
; GCN-DAG: buffer_store_dword [[LOAD5]], off, s[0:3], s32 offset:20
; GCN-DAG: buffer_store_dword [[LOAD6]], off, s[0:3], s32 offset:24
; GCN-DAG: buffer_store_dword [[LOAD7]], off, s[0:3], s32 offset:28


; GCN: s_swappc_b64
; GCN-NOT: s_sub_u32 s32
; GCN: s_endpgm
define amdgpu_kernel void @call_void_func_byval_struct_kernel() #1 {
entry:
%arg0 = alloca %struct.ByValStruct, align 4, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 4, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 4
call void @void_func_byval_struct(%struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

; GCN-LABEL: {{^}}void_func_byval_struct_align8:
; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s32{{$}}
; GCN-NOT: s32
; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-NOT: s32

; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s32 offset:16{{$}}
; GCN-NOT: s32
; GCN: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:16{{$}}
; GCN-NOT: s32
define hidden void @void_func_byval_struct_align8(%struct.ByValStruct addrspace(5)* byval noalias nocapture align 8 %arg0, %struct.ByValStruct addrspace(5)* byval noalias nocapture align 8 %arg1) #1 {
entry:
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
%tmp = load volatile i32, i32 addrspace(5)* %arrayidx, align 8
%add = add nsw i32 %tmp, 1
store volatile i32 %add, i32 addrspace(5)* %arrayidx, align 8
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
%tmp1 = load volatile i32, i32 addrspace(5)* %arrayidx2, align 8
%add3 = add nsw i32 %tmp1, 2
store volatile i32 %add3, i32 addrspace(5)* %arrayidx2, align 8
store volatile i32 9, i32 addrspace(1)* null, align 4
ret void
}

; Make sure the byval alignment is respected in the call frame setup
; GCN-LABEL: {{^}}call_void_func_byval_struct_align8_kernel:
; GCN: s_mov_b32 s33, s7
; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN: buffer_store_dword [[NINE]], off, s[0:3], s33 offset:8
; GCN: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13
; GCN: buffer_store_dword [[THIRTEEN]], off, s[0:3], s33 offset:24


; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s33 offset:8
; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s33 offset:12
; GCN: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s33 offset:16
; GCN: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s33 offset:20

; GCN-NOT: s_add_u32 s32, s32, 0x800
; GCN-DAG: s_add_u32 s32, s33, 0xc00{{$}}

; GCN: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12
; GCN: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
; GCN: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}


; GCN-DAG: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s33 offset:24
; GCN-DAG: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s33 offset:28
; GCN-DAG: buffer_load_dword [[LOAD6:v[0-9]+]], off, s[0:3], s33 offset:32
; GCN-DAG: buffer_load_dword [[LOAD7:v[0-9]+]], off, s[0:3], s33 offset:36

; GCN-DAG: buffer_store_dword [[LOAD4]], off, s[0:3], s32 offset:16
; GCN-DAG: buffer_store_dword [[LOAD5]], off, s[0:3], s32 offset:20
; GCN-DAG: buffer_store_dword [[LOAD6]], off, s[0:3], s32 offset:24
; GCN-DAG: buffer_store_dword [[LOAD7]], off, s[0:3], s32 offset:28


; GCN: s_swappc_b64
; GCN-NOT: s_sub_u32 s32
; GCN: s_endpgm
define amdgpu_kernel void @call_void_func_byval_struct_align8_kernel() #1 {
entry:
%arg0 = alloca %struct.ByValStruct, align 8, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 8, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 8
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 8
call void @void_func_byval_struct_align8(%struct.ByValStruct addrspace(5)* byval nonnull align 8 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 8 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

; GCN-LABEL: {{^}}call_void_func_byval_struct_align8_func:
; GCN: s_mov_b32 s34, s32
; GCN-DAG: s_add_u32 s32, s32, 0xc00{{$}}
; GCN-DAG: v_writelane_b32

; GCN-DAG: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN-DAG: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13

; GCN-DAG: buffer_store_dword [[NINE]], off, s[0:3], s34{{$}}
; GCN-DAG: buffer_store_dword [[THIRTEEN]], off, s[0:3], s34 offset:16

; GCN-DAG: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s34{{$}}
; GCN-DAG: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s34 offset:4
; GCN-DAG: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s34 offset:8
; GCN-DAG: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s34 offset:12

; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN-DAG: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}
; GCN-DAG: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
; GCN-DAG: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
; GCN-DAG: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12

; GCN: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s34 offset:16
; GCN: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s34 offset:20
; GCN: buffer_load_dword [[LOAD6:v[0-9]+]], off, s[0:3], s34 offset:24
; GCN: buffer_load_dword [[LOAD7:v[0-9]+]], off, s[0:3], s34 offset:28

; GCN: s_waitcnt vmcnt(0)
; GCN-DAG: buffer_store_dword [[LOAD4]], off, s[0:3], s32 offset:16
; GCN-DAG: buffer_store_dword [[LOAD5]], off, s[0:3], s32 offset:20
; GCN-DAG: buffer_store_dword [[LOAD6]], off, s[0:3], s32 offset:24
; GCN-DAG: buffer_store_dword [[LOAD7]], off, s[0:3], s32 offset:28

; GCN: s_swappc_b64
; GCN-NOT: v_readlane_b32 s32
; GCN: v_readlane_b32
; GCN-NOT: v_readlane_b32 s32

; GCN-NOT: s_sub_u32 s32, s32, 0x800

; GCN: s_sub_u32 s32, s32, 0xc00{{$}}
; GCN: v_readlane_b32 s34, v
; GCN: s_waitcnt
; GCN-NEXT: s_setpc_b64
define void @call_void_func_byval_struct_align8_func() #0 {
entry:
%arg0 = alloca %struct.ByValStruct, align 8, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 8, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 8
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 8
call void @void_func_byval_struct_align8(%struct.ByValStruct addrspace(5)* byval nonnull align 8 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 8 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

; GCN-LABEL: {{^}}call_void_func_byval_struct_kernel_no_frame_pointer_elim:
define amdgpu_kernel void @call_void_func_byval_struct_kernel_no_frame_pointer_elim() #2 {
entry:
%arg0 = alloca %struct.ByValStruct, align 4, addrspace(5)
%arg1 = alloca %struct.ByValStruct, align 4, addrspace(5)
%tmp = bitcast %struct.ByValStruct addrspace(5)* %arg0 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp)
%tmp1 = bitcast %struct.ByValStruct addrspace(5)* %arg1 to i8 addrspace(5)*
call void @llvm.lifetime.start.p5i8(i64 32, i8 addrspace(5)* %tmp1)
%arrayidx = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg0, i32 0, i32 0, i32 0
store volatile i32 9, i32 addrspace(5)* %arrayidx, align 4
%arrayidx2 = getelementptr inbounds %struct.ByValStruct, %struct.ByValStruct addrspace(5)* %arg1, i32 0, i32 0, i32 0
store volatile i32 13, i32 addrspace(5)* %arrayidx2, align 4
call void @void_func_byval_struct(%struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg0, %struct.ByValStruct addrspace(5)* byval nonnull align 4 %arg1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp1)
call void @llvm.lifetime.end.p5i8(i64 32, i8 addrspace(5)* %tmp)
ret void
}

declare hidden void @external_void_func_void() #0		declare hidden void @external_void_func_void() #0

declare void @llvm.lifetime.start.p5i8(i64, i8 addrspace(5)* nocapture) #3		declare void @llvm.lifetime.start.p5i8(i64, i8 addrspace(5)* nocapture) #3
declare void @llvm.lifetime.end.p5i8(i64, i8 addrspace(5)* nocapture) #3		declare void @llvm.lifetime.end.p5i8(i64, i8 addrspace(5)* nocapture) #3

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { noinline norecurse nounwind }		attributes #1 = { noinline norecurse nounwind }
attributes #2 = { nounwind norecurse "frame-pointer"="all" }		attributes #2 = { nounwind norecurse "frame-pointer"="all" }

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i1_imm() #0 {		define amdgpu_kernel void @test_call_external_void_func_i1_imm() #0 {
call void @external_void_func_i1(i1 true)		call void @external_void_func_i1(i1 true)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i1_signext:		; GCN-LABEL: {{^}}test_call_external_void_func_i1_signext:
; MESA: s_mov_b32 s33, s3{{$}}
; HSA: s_mov_b32 s33, s9{{$}}

; HSA: buffer_load_ubyte [[VAR:v[0-9]+]]		; HSA: buffer_load_ubyte [[VAR:v[0-9]+]]
; HSA: s_mov_b32 s32, s33		; HSA: s_mov_b32 s32, 0
; MESA-DAG: buffer_load_ubyte [[VAR:v[0-9]+]]		; MESA-DAG: buffer_load_ubyte [[VAR:v[0-9]+]]
; MESA-DAG: s_mov_b32 s32, s33{{$}}		; MESA-DAG: s_mov_b32 s32, 0{{$}}


; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i1_signext@rel32@lo+4		; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i1_signext@rel32@lo+4
; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i1_signext@rel32@hi+4		; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i1_signext@rel32@hi+4

; GCN: s_waitcnt vmcnt(0)		; GCN: s_waitcnt vmcnt(0)
; GCN-NEXT: v_bfe_i32 v0, v0, 0, 1		; GCN-NEXT: v_bfe_i32 v0, v0, 0, 1
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i1_signext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i1_signext(i32) #0 {
%var = load volatile i1, i1 addrspace(1)* undef		%var = load volatile i1, i1 addrspace(1)* undef
call void @external_void_func_i1_signext(i1 %var)		call void @external_void_func_i1_signext(i1 %var)
ret void		ret void
}		}

; FIXME: load should be scheduled before getpc		; FIXME: load should be scheduled before getpc
; GCN-LABEL: {{^}}test_call_external_void_func_i1_zeroext:		; GCN-LABEL: {{^}}test_call_external_void_func_i1_zeroext:
; MESA: s_mov_b32 s33, s3{{$}}

; HSA: buffer_load_ubyte v0		; HSA: buffer_load_ubyte v0
; HSA-DAG: s_mov_b32 s32, s33{{$}}		; HSA-DAG: s_mov_b32 s32, 0{{$}}

; MESA: buffer_load_ubyte v0		; MESA: buffer_load_ubyte v0
; MESA-DAG: s_mov_b32 s32, s33{{$}}		; MESA-DAG: s_mov_b32 s32, 0{{$}}

; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i1_zeroext@rel32@lo+4		; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i1_zeroext@rel32@lo+4
; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i1_zeroext@rel32@hi+4		; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i1_zeroext@rel32@hi+4


; GCN: s_waitcnt vmcnt(0)		; GCN: s_waitcnt vmcnt(0)
; GCN-NEXT: v_and_b32_e32 v0, 1, v0		; GCN-NEXT: v_and_b32_e32 v0, 1, v0
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i1_zeroext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i1_zeroext(i32) #0 {
%var = load volatile i1, i1 addrspace(1)* undef		%var = load volatile i1, i1 addrspace(1)* undef
call void @external_void_func_i1_zeroext(i1 %var)		call void @external_void_func_i1_zeroext(i1 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i8_imm:		; GCN-LABEL: {{^}}test_call_external_void_func_i8_imm:
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8@rel32@hi+4
; GCN-DAG: v_mov_b32_e32 v0, 0x7b		; GCN-DAG: v_mov_b32_e32 v0, 0x7b

; GCN-DAG: s_mov_b32 s32, s33{{$}}		; GCN-DAG: s_mov_b32 s32, 0{{$}}

; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i8_imm(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i8_imm(i32) #0 {
call void @external_void_func_i8(i8 123)		call void @external_void_func_i8(i8 123)
ret void		ret void
}		}

; FIXME: don't wait before call		; FIXME: don't wait before call
; GCN-LABEL: {{^}}test_call_external_void_func_i8_signext:		; GCN-LABEL: {{^}}test_call_external_void_func_i8_signext:
; HSA-DAG: s_mov_b32 s33, s9{{$}}
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-DAG: buffer_load_sbyte v0		; GCN-DAG: buffer_load_sbyte v0
; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8_signext@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8_signext@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8_signext@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8_signext@rel32@hi+4

; GCN-DAG: s_mov_b32 s32, s3		; GCN-DAG: s_mov_b32 s32, 0

; GCN-NOT: s_waitcnt		; GCN-NOT: s_waitcnt
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i8_signext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i8_signext(i32) #0 {
%var = load volatile i8, i8 addrspace(1)* undef		%var = load volatile i8, i8 addrspace(1)* undef
call void @external_void_func_i8_signext(i8 %var)		call void @external_void_func_i8_signext(i8 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i8_zeroext:		; GCN-LABEL: {{^}}test_call_external_void_func_i8_zeroext:
; MESA-DAG: s_mov_b32 s33, s3{{$}}
; HSA-DAG: s_mov_b32 s33, s9{{$}}

; GCN-DAG: buffer_load_ubyte v0		; GCN-DAG: buffer_load_ubyte v0
; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8_zeroext@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i8_zeroext@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8_zeroext@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i8_zeroext@rel32@hi+4

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN-NOT: s_waitcnt		; GCN-NOT: s_waitcnt
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i8_zeroext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i8_zeroext(i32) #0 {
%var = load volatile i8, i8 addrspace(1)* undef		%var = load volatile i8, i8 addrspace(1)* undef
call void @external_void_func_i8_zeroext(i8 %var)		call void @external_void_func_i8_zeroext(i8 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i16_imm:		; GCN-LABEL: {{^}}test_call_external_void_func_i16_imm:
; GCN-DAG: v_mov_b32_e32 v0, 0x7b{{$}}		; GCN-DAG: v_mov_b32_e32 v0, 0x7b{{$}}

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @test_call_external_void_func_i16_imm() #0 {		define amdgpu_kernel void @test_call_external_void_func_i16_imm() #0 {
call void @external_void_func_i16(i16 123)		call void @external_void_func_i16(i16 123)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i16_signext:		; GCN-LABEL: {{^}}test_call_external_void_func_i16_signext:
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-DAG: buffer_load_sshort v0		; GCN-DAG: buffer_load_sshort v0
; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i16_signext@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i16_signext@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i16_signext@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i16_signext@rel32@hi+4

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN-NOT: s_waitcnt		; GCN-NOT: s_waitcnt
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i16_signext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i16_signext(i32) #0 {
%var = load volatile i16, i16 addrspace(1)* undef		%var = load volatile i16, i16 addrspace(1)* undef
call void @external_void_func_i16_signext(i16 %var)		call void @external_void_func_i16_signext(i16 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i16_zeroext:		; GCN-LABEL: {{^}}test_call_external_void_func_i16_zeroext:
; MESA-DAG: s_mov_b32 s33, s3{{$}}


; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i16_zeroext@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i16_zeroext@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i16_zeroext@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i16_zeroext@rel32@hi+4

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN-NOT: s_waitcnt		; GCN-NOT: s_waitcnt
; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i16_zeroext(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i16_zeroext(i32) #0 {
%var = load volatile i16, i16 addrspace(1)* undef		%var = load volatile i16, i16 addrspace(1)* undef
call void @external_void_func_i16_zeroext(i16 %var)		call void @external_void_func_i16_zeroext(i16 %var)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_i32_imm:		; GCN-LABEL: {{^}}test_call_external_void_func_i32_imm:
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}		; GCN-DAG: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i32@rel32@lo+4		; GCN-DAG: s_add_u32 s[[PC_LO]], s[[PC_LO]], external_void_func_i32@rel32@lo+4
; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i32@rel32@hi+4		; GCN-DAG: s_addc_u32 s[[PC_HI]], s[[PC_HI]], external_void_func_i32@rel32@hi+4
; GCN-DAG: v_mov_b32_e32 v0, 42		; GCN-DAG: v_mov_b32_e32 v0, 42
; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0

; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}		; GCN: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_external_void_func_i32_imm(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_i32_imm(i32) #0 {
call void @external_void_func_i32(i32 42)		call void @external_void_func_i32(i32 42)
ret void		ret void
}		}

▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
; GCN-DAG: v_mov_b32_e32 v0, 1		; GCN-DAG: v_mov_b32_e32 v0, 1
; GCN-DAG: v_mov_b32_e32 v1, 2		; GCN-DAG: v_mov_b32_e32 v1, 2
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @test_call_external_void_func_v2i32_imm() #0 {		define amdgpu_kernel void @test_call_external_void_func_v2i32_imm() #0 {
call void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)		call void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_v3i32_imm:		; GCN-LABEL: {{^}}test_call_external_void_func_v3i32_imm: {{.*}}
; HSA-DAG: s_mov_b32 s33, s9
; MESA-DAG: s_mov_b32 s33, s3{{$}}

; GCN-NOT: v3		; GCN-NOT: v3
; GCN-DAG: v_mov_b32_e32 v0, 3		; GCN-DAG: v_mov_b32_e32 v0, 3
; GCN-DAG: v_mov_b32_e32 v1, 4		; GCN-DAG: v_mov_b32_e32 v1, 4
; GCN-DAG: v_mov_b32_e32 v2, 5		; GCN-DAG: v_mov_b32_e32 v2, 5

; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @test_call_external_void_func_v3i32_imm(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_v3i32_imm(i32) #0 {
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
define amdgpu_kernel void @test_call_external_void_func_v32i32() #0 {		define amdgpu_kernel void @test_call_external_void_func_v32i32() #0 {
%ptr = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef		%ptr = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef
%val = load <32 x i32>, <32 x i32> addrspace(1)* %ptr		%val = load <32 x i32>, <32 x i32> addrspace(1)* %ptr
call void @external_void_func_v32i32(<32 x i32> %val)		call void @external_void_func_v32i32(<32 x i32> %val)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_v32i32_i32:		; GCN-LABEL: {{^}}test_call_external_void_func_v32i32_i32:
; HSA-DAG: s_mov_b32 s33, s9
; HSA-NOT: s_add_u32 s32		; HSA-NOT: s_add_u32 s32

; MESA-DAG: s_mov_b32 s33, s3{{$}}
; MESA-NOT: s_add_u32 s32		; MESA-NOT: s_add_u32 s32

; GCN-DAG: buffer_load_dword [[VAL1:v[0-9]+]], off, s[{{[0-9]+}}:{{[0-9]+}}], 0{{$}}		; GCN-DAG: buffer_load_dword [[VAL1:v[0-9]+]], off, s[{{[0-9]+}}:{{[0-9]+}}], 0{{$}}
; GCN-DAG: buffer_load_dwordx4 v[0:3], off		; GCN-DAG: buffer_load_dwordx4 v[0:3], off
; GCN-DAG: buffer_load_dwordx4 v[4:7], off		; GCN-DAG: buffer_load_dwordx4 v[4:7], off
; GCN-DAG: buffer_load_dwordx4 v[8:11], off		; GCN-DAG: buffer_load_dwordx4 v[8:11], off
; GCN-DAG: buffer_load_dwordx4 v[12:15], off		; GCN-DAG: buffer_load_dwordx4 v[12:15], off
; GCN-DAG: buffer_load_dwordx4 v[16:19], off		; GCN-DAG: buffer_load_dwordx4 v[16:19], off
Show All 34 Lines	define amdgpu_kernel void @test_call_external_void_func_struct_i8_i32() #0 {
%val = load { i8, i32 }, { i8, i32 } addrspace(1)* %ptr0		%val = load { i8, i32 }, { i8, i32 } addrspace(1)* %ptr0
call void @external_void_func_struct_i8_i32({ i8, i32 } %val)		call void @external_void_func_struct_i8_i32({ i8, i32 } %val)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_byval_struct_i8_i32:		; GCN-LABEL: {{^}}test_call_external_void_func_byval_struct_i8_i32:
; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3		; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3
; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8		; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8
; MESA-DAG: buffer_store_byte [[VAL0]], off, s[36:39], s33 offset:8		; MESA-DAG: buffer_store_byte [[VAL0]], off, s[36:39], 0 offset:8
; MESA-DAG: buffer_store_dword [[VAL1]], off, s[36:39], s33 offset:12		; MESA-DAG: buffer_store_dword [[VAL1]], off, s[36:39], 0 offset:12

; HSA-DAG: buffer_store_byte [[VAL0]], off, s[0:3], s33 offset:8		; HSA-DAG: buffer_store_byte [[VAL0]], off, s[0:3], 0 offset:8
; HSA-DAG: buffer_store_dword [[VAL1]], off, s[0:3], s33 offset:12		; HSA-DAG: buffer_store_dword [[VAL1]], off, s[0:3], 0 offset:12

; HSA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[0:3], s33 offset:8		; HSA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[0:3], 0 offset:8
; HSA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[0:3], s33 offset:12		; HSA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[0:3], 0 offset:12

; MESA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[36:39], s33 offset:8		; MESA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[36:39], 0 offset:8
; MESA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[36:39], s33 offset:12		; MESA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[36:39], 0 offset:12

; GCN-DAG: s_add_u32 [[SP:s[0-9]+]], s33, 0x400{{$}}		; GCN-DAG: s_movk_i32 [[SP:s[0-9]+]], 0x400{{$}}

; HSA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[0:3], [[SP]]{{$}}		; HSA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[0:3], [[SP]]{{$}}
; HSA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[0:3], [[SP]] offset:4		; HSA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[0:3], [[SP]] offset:4

; MESA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[36:39], [[SP]]{{$}}		; MESA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[36:39], [[SP]]{{$}}
; MESA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[36:39], [[SP]] offset:4		; MESA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[36:39], [[SP]] offset:4

; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-NOT: [[SP]]		; GCN-NOT: [[SP]]
define amdgpu_kernel void @test_call_external_void_func_byval_struct_i8_i32() #0 {		define amdgpu_kernel void @test_call_external_void_func_byval_struct_i8_i32() #0 {
%val = alloca { i8, i32 }, align 4, addrspace(5)		%val = alloca { i8, i32 }, align 4, addrspace(5)
%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 0		%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 0
%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 1		%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 1
store i8 3, i8 addrspace(5)* %gep0		store i8 3, i8 addrspace(5)* %gep0
store i32 8, i32 addrspace(5)* %gep1		store i32 8, i32 addrspace(5)* %gep1
call void @external_void_func_byval_struct_i8_i32({ i8, i32 } addrspace(5)* %val)		call void @external_void_func_byval_struct_i8_i32({ i8, i32 } addrspace(5)* %val)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:		; GCN-LABEL: {{^}}test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
; MESA-DAG: s_add_u32 [[SP:s[0-9]+]], [[FP_REG:s[0-9]+]], 0x800{{$}}		; GCN-DAG: s_movk_i32 [[SP:s[0-9]+]], 0x800{{$}}
; HSA-DAG: s_add_u32 [[SP:s[0-9]+]], [[FP_REG:s[0-9]+]], 0x800{{$}}

; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3		; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3
; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8		; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8
; GCN-DAG: buffer_store_byte [[VAL0]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:8		; GCN-DAG: buffer_store_byte [[VAL0]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8
; GCN-DAG: buffer_store_dword [[VAL1]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:12		; GCN-DAG: buffer_store_dword [[VAL1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12

; GCN-DAG: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:8		; GCN-DAG: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8
; GCN-DAG: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:12		; GCN-DAG: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12

; GCN-NOT: s_add_u32 [[SP]]		; GCN-NOT: s_add_u32 [[SP]]
; GCN-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s{{\[[0-9]+:[0-9]+\]}}, [[SP]]{{$}}		; GCN-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s{{\[[0-9]+:[0-9]+\]}}, [[SP]]{{$}}
; GCN-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s{{\[[0-9]+:[0-9]+\]}}, [[SP]] offset:4		; GCN-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s{{\[[0-9]+:[0-9]+\]}}, [[SP]] offset:4
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-DAG: buffer_load_ubyte [[LOAD_OUT_VAL0:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:16		; GCN-DAG: buffer_load_ubyte [[LOAD_OUT_VAL0:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:16
; GCN-DAG: buffer_load_dword [[LOAD_OUT_VAL1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, [[FP_REG]] offset:20		; GCN-DAG: buffer_load_dword [[LOAD_OUT_VAL1:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:20
; GCN-NOT: s_sub_u32 [[SP]]		; GCN-NOT: s_sub_u32 [[SP]]

; GCN: buffer_store_byte [[LOAD_OUT_VAL0]], off		; GCN: buffer_store_byte [[LOAD_OUT_VAL0]], off
; GCN: buffer_store_dword [[LOAD_OUT_VAL1]], off		; GCN: buffer_store_dword [[LOAD_OUT_VAL1]], off
define amdgpu_kernel void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {		define amdgpu_kernel void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {
%in.val = alloca { i8, i32 }, align 4, addrspace(5)		%in.val = alloca { i8, i32 }, align 4, addrspace(5)
%out.val = alloca { i8, i32 }, align 4, addrspace(5)		%out.val = alloca { i8, i32 }, align 4, addrspace(5)
%in.gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %in.val, i32 0, i32 0		%in.gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %in.val, i32 0, i32 0
▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-constant.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck -check-prefix=GCN %s

	; FIXME: Emitting unnecessary flat_scratch setup			; FIXME: Emitting unnecessary flat_scratch setup

	; GCN-LABEL: {{^}}test_call_undef:			; GCN-LABEL: {{^}}test_call_undef:
	; GCN: s_mov_b32 s8, s7
	; GCN: s_mov_b32 flat_scratch_lo, s5			; GCN: s_mov_b32 flat_scratch_lo, s5
	; GCN: s_add_u32 s4, s4, s8			; GCN: s_add_u32 s4, s4, s7
	; GCN: s_lshr_b32			; GCN: s_lshr_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @test_call_undef() #0 {			define amdgpu_kernel void @test_call_undef() #0 {
	%val = call i32 undef(i32 1)			%val = call i32 undef(i32 1)
	%op = add i32 %val, 1			%op = add i32 %val, 1
	store volatile i32 %op, i32 addrspace(1)* undef			store volatile i32 %op, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_tail_call_undef:			; GCN-LABEL: {{^}}test_tail_call_undef:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: .Lfunc_end			; GCN-NEXT: .Lfunc_end
	define i32 @test_tail_call_undef() #0 {			define i32 @test_tail_call_undef() #0 {
	%call = tail call i32 undef(i32 1)			%call = tail call i32 undef(i32 1)
	ret i32 %call			ret i32 %call
	}			}

	; GCN-LABEL: {{^}}test_call_null:			; GCN-LABEL: {{^}}test_call_null:
	; GCN: s_mov_b32 s8, s7
	; GCN: s_mov_b32 flat_scratch_lo, s5			; GCN: s_mov_b32 flat_scratch_lo, s5
	; GCN: s_add_u32 s4, s4, s8			; GCN: s_add_u32 s4, s4, s7
	; GCN: s_lshr_b32			; GCN: s_lshr_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @test_call_null() #0 {			define amdgpu_kernel void @test_call_null() #0 {
	%val = call i32 null(i32 1)			%val = call i32 null(i32 1)
	%op = add i32 %val, 1			%op = add i32 %val, 1
	store volatile i32 %op, i32 addrspace(1)* null			store volatile i32 %op, i32 addrspace(1)* null
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_tail_call_null:			; GCN-LABEL: {{^}}test_tail_call_null:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: .Lfunc_end			; GCN-NEXT: .Lfunc_end
	define i32 @test_tail_call_null() #0 {			define i32 @test_tail_call_null() #0 {
	%call = tail call i32 null(i32 1)			%call = tail call i32 null(i32 1)
	ret i32 %call			ret i32 %call
	}			}

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

declare hidden void @external_void_func_void() #0		declare hidden void @external_void_func_void() #0

; GCN-LABEL: {{^}}test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GCN-LABEL: {{^}}test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; GCN: s_mov_b32 s33, s7
; GCN: s_getpc_b64 s[34:35]		; GCN: s_getpc_b64 s[34:35]
; GCN-NEXT: s_add_u32 s34, s34,		; GCN-NEXT: s_add_u32 s34, s34,
; GCN-NEXT: s_addc_u32 s35, s35,		; GCN-NEXT: s_addc_u32 s35, s35,
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_mov_b32 s32, 0
; GCN: s_swappc_b64 s[30:31], s[34:35]		; GCN: s_swappc_b64 s[30:31], s[34:35]

; GCN-NEXT: #ASMSTART		; GCN-NEXT: #ASMSTART
; GCN-NEXT: #ASMEND		; GCN-NEXT: #ASMEND
; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]		; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]
define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {		define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_call_void_func_void_clobber_vcc(i32 addrspace(1)* %out) #0 {
call void @void_func_void_clobber_vcc()		call void @void_func_void_clobber_vcc()
%val0 = load volatile i32, i32 addrspace(1)* undef		%val0 = load volatile i32, i32 addrspace(1)* undef
%val1 = load volatile i32, i32 addrspace(1)* undef		%val1 = load volatile i32, i32 addrspace(1)* undef
call void asm sideeffect "; use $0", "{vcc}"(i64 %vcc)		call void asm sideeffect "; use $0", "{vcc}"(i64 %vcc)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_s31:		; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_s31:
; GCN: s_mov_b32 s34, s31		; GCN: s_mov_b32 s33, s31
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-NEXT: s_mov_b32 s31, s34		; GCN-NEXT: s_mov_b32 s31, s33
define amdgpu_kernel void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {
%s31 = call i32 asm sideeffect "; def $0", "={s31}"()		%s31 = call i32 asm sideeffect "; def $0", "={s31}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{s31}"(i32 %s31)		call void asm sideeffect "; use $0", "{s31}"(i32 %s31)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_v31:		; GCN-LABEL: {{^}}test_call_void_func_void_mayclobber_v31:
; GCN: v_mov_b32_e32 v32, v31		; GCN: v_mov_b32_e32 v32, v31
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-NEXT: v_mov_b32_e32 v31, v32		; GCN-NEXT: v_mov_b32_e32 v31, v32
define amdgpu_kernel void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
%v31 = call i32 asm sideeffect "; def $0", "={v31}"()		%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{v31}"(i32 %v31)		call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
ret void		ret void
}		}

; FIXME: What is the expected behavior for reserved registers here?

; GCN-LABEL: {{^}}test_call_void_func_void_preserves_s33:		; GCN-LABEL: {{^}}test_call_void_func_void_preserves_s33:
; GCN: s_mov_b32 s33, s9
; GCN: s_mov_b32 s32, s33
; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4		; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4
		; GCN: s_mov_b32 s32, 0
; GCN: #ASMSTART		; GCN: #ASMSTART
; GCN-NEXT: ; def s33		; GCN-NEXT: ; def s33
; GCN-NEXT: #ASMEND		; GCN-NEXT: #ASMEND
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; use s33		; GCN-NEXT: ; use s33
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NOT: s33		; GCN-NOT: s33
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {
%s33 = call i32 asm sideeffect "; def $0", "={s33}"()		%s33 = call i32 asm sideeffect "; def $0", "={s33}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{s33}"(i32 %s33)		call void asm sideeffect "; use $0", "{s33}"(i32 %s33)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_preserves_s34:		; FIXME: What is the expected behavior for reserved registers here?
; GCN: s_mov_b32 s33, s9
; GCN-NOT: s34		; GCN-LABEL: {{^}}test_call_void_func_void_preserves_s34: {{.*}}
; GCN-NOT: s34		; GCN-NOT: s34

; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4		; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4
		; GCN: s_mov_b32 s32, 0

; GCN-NOT: s34		; GCN-NOT: s34
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; def s34		; GCN-NEXT: ; def s34
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND

; GCN-NOT: s34		; GCN-NOT: s34
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]

; GCN-NOT: s34		; GCN-NOT: s34

; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ; use s34		; GCN-NEXT: ; use s34
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {
%s34 = call i32 asm sideeffect "; def $0", "={s34}"()		%s34 = call i32 asm sideeffect "; def $0", "={s34}"()
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "; use $0", "{s34}"(i32 %s34)		call void asm sideeffect "; use $0", "{s34}"(i32 %s34)
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_preserves_v32:		; GCN-LABEL: {{^}}test_call_void_func_void_preserves_v32: {{.*}}
; GCN: s_mov_b32 s33, s9

; GCN-NOT: v32		; GCN-NOT: v32
; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4		; GCN-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+4
		; GCN: s_mov_b32 s32, 0
; GCN-NOT: v32		; GCN-NOT: v32
; GCN-DAG: s_mov_b32 s32, s33

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; def v32		; GCN-NEXT: ; def v32
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND

; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]

; GCN-NOT: v32		; GCN-NOT: v32
Show All 29 Lines
; GCN-NEXT: v_readlane_b32 s34, v0, 0		; GCN-NEXT: v_readlane_b32 s34, v0, 0
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define hidden void @void_func_void_clobber_s34() #2 {		define hidden void @void_func_void_clobber_s34() #2 {
call void asm sideeffect "; clobber", "~{s34}"() #0		call void asm sideeffect "; clobber", "~{s34}"() #0
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_clobber_s33:		; GCN-LABEL: {{^}}test_call_void_func_void_clobber_s33:
; GCN: s_mov_b32 s33, s7

; GCN: s_getpc_b64		; GCN: s_getpc_b64
; GCN-NEXT: s_add_u32		; GCN-NEXT: s_add_u32
; GCN-NEXT: s_addc_u32		; GCN-NEXT: s_addc_u32
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_clobber_s33() #0 {		define amdgpu_kernel void @test_call_void_func_void_clobber_s33() #0 {
call void @void_func_void_clobber_s33()		call void @void_func_void_clobber_s33()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_call_void_func_void_clobber_s34:		; GCN-LABEL: {{^}}test_call_void_func_void_clobber_s34:
; GCN: s_mov_b32 s33, s7
; GCN: s_getpc_b64		; GCN: s_getpc_b64
; GCN-NEXT: s_add_u32		; GCN-NEXT: s_add_u32
; GCN-NEXT: s_addc_u32		; GCN-NEXT: s_addc_u32
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @test_call_void_func_void_clobber_s34() #0 {		define amdgpu_kernel void @test_call_void_func_void_clobber_s34() #0 {
call void @void_func_void_clobber_s34()		call void @void_func_void_clobber_s34()
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_saved_sgpr_func:		; GCN-LABEL: {{^}}callee_saved_sgpr_func:
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-waitcnt.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

	; Load argument depends on waitcnt which should be skipped.			; Load argument depends on waitcnt which should be skipped.
	define amdgpu_kernel void @call_memory_arg_load(i32 addrspace(3)* %ptr, i32) #0 {			define amdgpu_kernel void @call_memory_arg_load(i32 addrspace(3)* %ptr, i32) #0 {
	; GCN-LABEL: call_memory_arg_load:			; GCN-LABEL: call_memory_arg_load:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dword s4, s[4:5], 0x0			; GCN-NEXT: s_load_dword s4, s[4:5], 0x0
	; GCN-NEXT: s_mov_b32 s33, s9			; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s4			; GCN-NEXT: v_mov_b32_e32 v0, s4
	; GCN-NEXT: ds_read_b32 v0, v0			; GCN-NEXT: ds_read_b32 v0, v0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4
				; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%vgpr = load volatile i32, i32 addrspace(3)* %ptr			%vgpr = load volatile i32, i32 addrspace(3)* %ptr
	call void @func(i32 %vgpr)			call void @func(i32 %vgpr)
	ret void			ret void
	}			}

	; Memory waitcnt with no register dependence on the call			; Memory waitcnt with no register dependence on the call
	define amdgpu_kernel void @call_memory_no_dep(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_memory_no_dep(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_memory_no_dep:			; GCN-LABEL: call_memory_no_dep:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_mov_b32 s33, s9			; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s4			; GCN-NEXT: v_mov_b32_e32 v0, s4
	; GCN-NEXT: v_mov_b32_e32 v1, s5			; GCN-NEXT: v_mov_b32_e32 v1, s5
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: global_store_dword v[0:1], v2, off			; GCN-NEXT: global_store_dword v[0:1], v2, off
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_getpc_b64 s[6:7]			; GCN-NEXT: s_getpc_b64 s[6:7]
	; GCN-NEXT: s_add_u32 s6, s6, func@rel32@lo+4			; GCN-NEXT: s_add_u32 s6, s6, func@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s7, s7, func@rel32@hi+4			; GCN-NEXT: s_addc_u32 s7, s7, func@rel32@hi+4
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GCN-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	store i32 0, i32 addrspace(1)* %ptr			store i32 0, i32 addrspace(1)* %ptr
	call void @func(i32 0)			call void @func(i32 0)
	ret void			ret void
	}			}

	; Should not wait after the call before memory			; Should not wait after the call before memory
	define amdgpu_kernel void @call_no_wait_after_call(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_no_wait_after_call(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_no_wait_after_call:			; GCN-LABEL: call_no_wait_after_call:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
				; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0
	; GCN-NEXT: s_mov_b32 s33, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func@rel32@hi+4
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: v_mov_b32_e32 v32, 0			; GCN-NEXT: v_mov_b32_e32 v32, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_mov_b32_e32 v0, s34			; GCN-NEXT: v_mov_b32_e32 v0, s34
	; GCN-NEXT: v_mov_b32_e32 v1, s35			; GCN-NEXT: v_mov_b32_e32 v1, s35
	; GCN-NEXT: global_store_dword v[0:1], v32, off			; GCN-NEXT: global_store_dword v[0:1], v32, off
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	call void @func(i32 0)			call void @func(i32 0)
	store i32 0, i32 addrspace(1)* %ptr			store i32 0, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	define amdgpu_kernel void @call_no_wait_after_call_return_val(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_no_wait_after_call_return_val(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_no_wait_after_call_return_val:			; GCN-LABEL: call_no_wait_after_call_return_val:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
				; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0
	; GCN-NEXT: s_mov_b32 s33, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func.return@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func.return@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func.return@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func.return@rel32@hi+4
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_mov_b32_e32 v1, s34			; GCN-NEXT: v_mov_b32_e32 v1, s34
	; GCN-NEXT: v_mov_b32_e32 v2, s35			; GCN-NEXT: v_mov_b32_e32 v2, s35
	; GCN-NEXT: global_store_dword v[1:2], v0, off			; GCN-NEXT: global_store_dword v[1:2], v0, off
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%rv = call i32 @func.return(i32 0)			%rv = call i32 @func.return(i32 0)
	store i32 %rv, i32 addrspace(1)* %ptr			store i32 %rv, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	; Need to wait for the address dependency			; Need to wait for the address dependency
	define amdgpu_kernel void @call_got_load(i32 addrspace(1)* %ptr, i32) #0 {			define amdgpu_kernel void @call_got_load(i32 addrspace(1)* %ptr, i32) #0 {
	; GCN-LABEL: call_got_load:			; GCN-LABEL: call_got_load:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s33, s9			; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; GCN-NEXT: s_add_u32 s0, s0, s9
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, got.func@gotpcrel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, got.func@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, got.func@gotpcrel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, got.func@gotpcrel32@hi+4
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: s_mov_b32 s32, s33			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	call void @got.func(i32 0)			call void @got.func(i32 0)
	ret void			ret void
	}			}

	; Need to wait for the address dependency			; Need to wait for the address dependency
	Show All 35 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	define hidden void @use_every_sgpr_input() #1 {

%val6 = call i32 @llvm.amdgcn.workgroup.id.z()		%val6 = call i32 @llvm.amdgcn.workgroup.id.z()
call void asm sideeffect "; use $0", "s"(i32 %val6)		call void asm sideeffect "; use $0", "s"(i32 %val6)

ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_every_sgpr_input:		; GCN-LABEL: {{^}}kern_indirect_use_every_sgpr_input:
; GCN: s_mov_b32 s33, s17
; GCN: s_mov_b32 s12, s14		; GCN: s_mov_b32 s12, s14
; GCN: s_mov_b32 s13, s15		; GCN: s_mov_b32 s13, s15
; GCN: s_mov_b32 s14, s16		; GCN: s_mov_b32 s14, s16
; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN: .amdhsa_user_sgpr_private_segment_buffer 1		; GCN: .amdhsa_user_sgpr_private_segment_buffer 1
; GCN: .amdhsa_user_sgpr_dispatch_ptr 1		; GCN: .amdhsa_user_sgpr_dispatch_ptr 1
; GCN: .amdhsa_user_sgpr_queue_ptr 1		; GCN: .amdhsa_user_sgpr_queue_ptr 1
; GCN: .amdhsa_user_sgpr_kernarg_segment_ptr 1		; GCN: .amdhsa_user_sgpr_kernarg_segment_ptr 1
; GCN: .amdhsa_user_sgpr_dispatch_id 1		; GCN: .amdhsa_user_sgpr_dispatch_id 1
; GCN: .amdhsa_user_sgpr_flat_scratch_init 1		; GCN: .amdhsa_user_sgpr_flat_scratch_init 1
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN-NOT: s6		; GCN-NOT: s6
; GCN: s_mov_b32 s4, s6		; GCN: s_mov_b32 s4, s6
; GCN-NEXT: s_getpc_b64 s[6:7]		; GCN-NEXT: s_getpc_b64 s[6:7]
; GCN-NEXT: s_add_u32 s6, s6, use_workgroup_id_x@rel32@lo+4		; GCN-NEXT: s_add_u32 s6, s6, use_workgroup_id_x@rel32@lo+4
; GCN-NEXT: s_addc_u32 s7, s7, use_workgroup_id_x@rel32@hi+4		; GCN-NEXT: s_addc_u32 s7, s7, use_workgroup_id_x@rel32@hi+4
; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @kern_indirect_use_workgroup_id_x() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_x() #1 {
call void @use_workgroup_id_x()		call void @use_workgroup_id_x()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_y:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_y:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN: s_mov_b32 s33, s8		; GCN: s_mov_b32 s4, s7
; GCN-DAG: s_mov_b32 s4, s7		; GCN: s_mov_b32 s32, 0
; GCN: s_mov_b32 s32, s33
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_y() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_y() #1 {
call void @use_workgroup_id_y()		call void @use_workgroup_id_y()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_z:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_z:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1
; GCN: s_mov_b32 s33, s8
; GCN: s_mov_b32 s4, s7		; GCN: s_mov_b32 s4, s7

		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_z() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_z() #1 {
call void @use_workgroup_id_z()		call void @use_workgroup_id_z()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xy:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xy:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN: s_mov_b32 s33, s8

; GCN: s_mov_b32 s5, s7		; GCN: s_mov_b32 s5, s7
; GCN: s_mov_b32 s4, s6		; GCN: s_mov_b32 s4, s6
; GCN: s_mov_b32 s32, s33
		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_xy() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_xy() #1 {
call void @use_workgroup_id_xy()		call void @use_workgroup_id_xy()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xyz:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xyz:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1

; GCN: s_mov_b32 s33, s9

; GCN: s_mov_b32 s4, s6		; GCN: s_mov_b32 s4, s6
; GCN: s_mov_b32 s5, s7		; GCN: s_mov_b32 s5, s7
; GCN: s_mov_b32 s6, s8		; GCN: s_mov_b32 s6, s8

; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_xyz() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_xyz() #1 {
call void @use_workgroup_id_xyz()		call void @use_workgroup_id_xyz()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xz:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_xz:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1

; GCN: s_mov_b32 s33, s8
; GCN: s_mov_b32 s5, s7		; GCN: s_mov_b32 s5, s7
; GCN: s_mov_b32 s4, s6		; GCN: s_mov_b32 s4, s6

; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0

; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_xz() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_xz() #1 {
call void @use_workgroup_id_xz()		call void @use_workgroup_id_xz()
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_yz:		; GCN-LABEL: {{^}}kern_indirect_use_workgroup_id_yz:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1

; GCN: s_mov_b32 s33, s9
; GCN: s_mov_b32 s4, s7		; GCN: s_mov_b32 s4, s7
; GCN: s_mov_b32 s5, s8		; GCN: s_mov_b32 s5, s8
; GCN: s_mov_b32 s32, s33
		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_workgroup_id_yz() #1 {		define amdgpu_kernel void @kern_indirect_use_workgroup_id_yz() #1 {
call void @use_workgroup_id_yz()		call void @use_workgroup_id_yz()
ret void		ret void
}		}

; Argument is in right place already		; Argument is in right place already
; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:		; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	define hidden void @other_arg_use_workgroup_id_z(i32 %arg0) #1 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_x:		; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_x:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN-DAG: s_mov_b32 s33, s7
; GCN-DAG: v_mov_b32_e32 v0, 0x22b		; GCN-DAG: v_mov_b32_e32 v0, 0x22b
; GCN-DAG: s_mov_b32 s4, s6		; GCN-DAG: s_mov_b32 s4, s6
; GCN-DAG: s_mov_b32 s32, s33
		; GCN-DAG: s_mov_b32 s32, 0
; GCN-NOT: s4		; GCN-NOT: s4
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_x() #1 {		define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_x() #1 {
call void @other_arg_use_workgroup_id_x(i32 555)		call void @other_arg_use_workgroup_id_x(i32 555)
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_y:		; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_y:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 1		; GCN: enable_sgpr_workgroup_id_y = 1
; GCN: enable_sgpr_workgroup_id_z = 0		; GCN: enable_sgpr_workgroup_id_z = 0

; GCN-DAG: s_mov_b32 s33, s8
; GCN-DAG: v_mov_b32_e32 v0, 0x22b		; GCN-DAG: v_mov_b32_e32 v0, 0x22b
; GCN-DAG: s_mov_b32 s4, s7		; GCN-DAG: s_mov_b32 s4, s7

; GCN-DAG: s_mov_b32 s32, s33		; GCN-DAG: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_y() #1 {		define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_y() #1 {
call void @other_arg_use_workgroup_id_y(i32 555)		call void @other_arg_use_workgroup_id_y(i32 555)
ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_z:		; GCN-LABEL: {{^}}kern_indirect_other_arg_use_workgroup_id_z:
; GCN: enable_sgpr_workgroup_id_x = 1		; GCN: enable_sgpr_workgroup_id_x = 1
; GCN: enable_sgpr_workgroup_id_y = 0		; GCN: enable_sgpr_workgroup_id_y = 0
; GCN: enable_sgpr_workgroup_id_z = 1		; GCN: enable_sgpr_workgroup_id_z = 1

; GCN-DAG: s_mov_b32 s33, s8
; GCN-DAG: v_mov_b32_e32 v0, 0x22b		; GCN-DAG: v_mov_b32_e32 v0, 0x22b

; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_z() #1 {		define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_z() #1 {
call void @other_arg_use_workgroup_id_z(i32 555)		call void @other_arg_use_workgroup_id_z(i32 555)
ret void		ret void
}		}

; GCN-LABEL: {{^}}use_every_sgpr_input:		; GCN-LABEL: {{^}}use_every_sgpr_input:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32{{$}}
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines

; GCN: enable_sgpr_private_segment_buffer = 1		; GCN: enable_sgpr_private_segment_buffer = 1
; GCN: enable_sgpr_dispatch_ptr = 1		; GCN: enable_sgpr_dispatch_ptr = 1
; GCN: enable_sgpr_queue_ptr = 1		; GCN: enable_sgpr_queue_ptr = 1
; GCN: enable_sgpr_kernarg_segment_ptr = 1		; GCN: enable_sgpr_kernarg_segment_ptr = 1
; GCN: enable_sgpr_dispatch_id = 1		; GCN: enable_sgpr_dispatch_id = 1
; GCN: enable_sgpr_flat_scratch_init = 1		; GCN: enable_sgpr_flat_scratch_init = 1

; GCN: s_mov_b32 s33, s17
; GCN: s_mov_b32 s12, s14		; GCN: s_mov_b32 s12, s14
; GCN: s_mov_b32 s13, s15		; GCN: s_mov_b32 s13, s15
; GCN: s_mov_b32 s14, s16		; GCN: s_mov_b32 s14, s16
; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_indirect_use_every_sgpr_input() #1 {		define amdgpu_kernel void @kern_indirect_use_every_sgpr_input() #1 {
call void @use_every_sgpr_input()		call void @use_every_sgpr_input()
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_indirect_use_every_sgpr_input:		; GCN-LABEL: {{^}}func_indirect_use_every_sgpr_input:
; GCN-NOT: s6		; GCN-NOT: s6
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

Show First 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	define void @too_many_args_use_workitem_id_x(
store volatile i32 %arg31, i32 addrspace(1)* undef		store volatile i32 %arg31, i32 addrspace(1)* undef

ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x:		; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x:
; VARABI: enable_vgpr_workitem_id = 0		; VARABI: enable_vgpr_workitem_id = 0

; VARABI: s_mov_b32 s33, s7		; VARABI: s_mov_b32 s32, 0
; VARABI: s_mov_b32 s32, s33
; VARABI: buffer_store_dword v0, off, s[0:3], s32{{$}}		; VARABI: buffer_store_dword v0, off, s[0:3], s32{{$}}
; VARABI: s_swappc_b64		; VARABI: s_swappc_b64


; FIXEDABI: enable_vgpr_workitem_id = 2		; FIXEDABI: enable_vgpr_workitem_id = 2
; FIXEDABI: s_mov_b32 s33, s17		; FIXEDABI-DAG: s_mov_b32 s32, 0
; FIXEDABI-DAG: s_mov_b32 s32, s33
; FIXEDABI-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x140{{$}}		; FIXEDABI-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x140{{$}}
; FIXEDABI-DAG: v_lshlrev_b32_e32 [[TMP1:v[0-9]+]], 10, v1		; FIXEDABI-DAG: v_lshlrev_b32_e32 [[TMP1:v[0-9]+]], 10, v1
; FIXEDABI-DAG: v_lshlrev_b32_e32 [[TMP0:v[0-9]+]], 20, v2		; FIXEDABI-DAG: v_lshlrev_b32_e32 [[TMP0:v[0-9]+]], 20, v2
; FIXEDABI-DAG: v_or_b32_e32 [[TMP2:v[0-9]+]], v0, [[TMP1]]		; FIXEDABI-DAG: v_or_b32_e32 [[TMP2:v[0-9]+]], v0, [[TMP1]]
; FIXEDABI-DAG: v_or_b32_e32 v31, [[TMP2]], [[TMP0]]		; FIXEDABI-DAG: v_or_b32_e32 v31, [[TMP2]], [[TMP0]]
; FIXEDABI: buffer_store_dword [[K]], off, s[0:3], s32{{$}}		; FIXEDABI: buffer_store_dword [[K]], off, s[0:3], s32{{$}}

; FIXEDABI: s_swappc_b64		; FIXEDABI: s_swappc_b64
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines

; var abi stack layout:		; var abi stack layout:
; sp[0] = byval		; sp[0] = byval
; sp[1] = ??		; sp[1] = ??
; sp[2] = stack passed workitem ID x		; sp[2] = stack passed workitem ID x

; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x_byval:		; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x_byval:
; VARABI: enable_vgpr_workitem_id = 0		; VARABI: enable_vgpr_workitem_id = 0
; VARABI-DAG: s_mov_b32 s33, s7		; VARABI: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}
; VARABI-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}		; VARABI: buffer_store_dword [[K]], off, s[0:3], 0 offset:4
; VARABI: buffer_store_dword [[K]], off, s[0:3], s33 offset:4		; VARABI: buffer_load_dword [[RELOAD_BYVAL:v[0-9]+]], off, s[0:3], 0 offset:4
; VARABI: buffer_load_dword [[RELOAD_BYVAL:v[0-9]+]], off, s[0:3], s33 offset:4		; VARABI: s_movk_i32 s32, 0x400{{$}}
; VARABI: s_add_u32 s32, s33, 0x400{{$}}

; VARABI-NOT: s32		; VARABI-NOT: s32
; VARABI: buffer_store_dword v0, off, s[0:3], s32 offset:4		; VARABI: buffer_store_dword v0, off, s[0:3], s32 offset:4

; VARABI: buffer_store_dword [[RELOAD_BYVAL]], off, s[0:3], s32{{$}}		; VARABI: buffer_store_dword [[RELOAD_BYVAL]], off, s[0:3], s32{{$}}
; VARABI: v_mov_b32_e32 [[RELOAD_BYVAL]],		; VARABI: v_mov_b32_e32 [[RELOAD_BYVAL]],
; VARABI: s_swappc_b64		; VARABI: s_swappc_b64


; FIXEDABI: s_mov_b32 s33, s17		; FIXEDABI: v_mov_b32_e32 [[K0:v[0-9]+]], 0x3e7
; FIXEDABI-DAG: s_add_u32 s32, s33, 0x400		; FIXEDABI: buffer_store_dword [[K0]], off, s[0:3], 0 offset:4{{$}}
; FIXEDABI-DAG: v_mov_b32_e32 [[K0:v[0-9]+]], 0x3e7
; FIXEDABI: buffer_store_dword [[K0]], off, s[0:3], s33 offset:4{{$}}		; FIXEDABI: s_movk_i32 s32, 0x400{{$}}

; FIXEDABI: v_mov_b32_e32 [[K1:v[0-9]+]], 0x140		; FIXEDABI: v_mov_b32_e32 [[K1:v[0-9]+]], 0x140
; FIXEDABI: buffer_store_dword [[K1]], off, s[0:3], s32{{$}}		; FIXEDABI: buffer_store_dword [[K1]], off, s[0:3], s32{{$}}

; FIXME: Why this reload?		; FIXME: Why this reload?
; FIXEDABI: buffer_load_dword [[RELOAD:v[0-9]+]], off, s[0:3], s33 offset:4{{$}}		; FIXEDABI: buffer_load_dword [[RELOAD:v[0-9]+]], off, s[0:3], 0 offset:4{{$}}

; FIXEDABI-DAG: v_lshlrev_b32_e32 [[TMP1:v[0-9]+]], 10, v1		; FIXEDABI-DAG: v_lshlrev_b32_e32 [[TMP1:v[0-9]+]], 10, v1
; FIXEDABI-DAG: v_lshlrev_b32_e32 [[TMP0:v[0-9]+]], 20, v2		; FIXEDABI-DAG: v_lshlrev_b32_e32 [[TMP0:v[0-9]+]], 20, v2
; FIXEDABI-DAG: v_or_b32_e32 [[TMP2:v[0-9]+]], v0, [[TMP1]]		; FIXEDABI-DAG: v_or_b32_e32 [[TMP2:v[0-9]+]], v0, [[TMP1]]
; FIXEDABI: v_or_b32_e32 v31, [[TMP2]], [[TMP0]]		; FIXEDABI: v_or_b32_e32 v31, [[TMP2]], [[TMP0]]

; FIXEDABI-NOT: s32		; FIXEDABI-NOT: s32
; FIXEDABI: buffer_store_dword [[RELOAD]], off, s[0:3], s32 offset:4		; FIXEDABI: buffer_store_dword [[RELOAD]], off, s[0:3], s32 offset:4
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	define void @too_many_args_use_workitem_id_xyz(
store volatile i32 %arg31, i32 addrspace(1)* undef		store volatile i32 %arg31, i32 addrspace(1)* undef

ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_xyz:		; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_xyz:
; GCN: enable_vgpr_workitem_id = 2		; GCN: enable_vgpr_workitem_id = 2

; VARABI-DAG: s_mov_b32 s33, s7		; GCN-DAG: s_mov_b32 s32, 0
; FIXEDABI-DAG: s_mov_b32 s33, s17
; GCN-DAG: s_mov_b32 s32, s33

; GCN-DAG: v_lshlrev_b32_e32 [[TMP1:v[0-9]+]], 10, v1		; GCN-DAG: v_lshlrev_b32_e32 [[TMP1:v[0-9]+]], 10, v1
; GCN-DAG: v_lshlrev_b32_e32 [[TMP0:v[0-9]+]], 20, v2		; GCN-DAG: v_lshlrev_b32_e32 [[TMP0:v[0-9]+]], 20, v2
; GCN-DAG: v_or_b32_e32 [[TMP2:v[0-9]+]], v0, [[TMP1]]		; GCN-DAG: v_or_b32_e32 [[TMP2:v[0-9]+]], v0, [[TMP1]]
; VARABI-DAG: v_or_b32_e32 [[PACKEDID:v[0-9]+]], [[TMP2]], [[TMP0]]		; VARABI-DAG: v_or_b32_e32 [[PACKEDID:v[0-9]+]], [[TMP2]], [[TMP0]]
; VARABI: buffer_store_dword [[PACKEDID]], off, s[0:3], s32{{$}}		; VARABI: buffer_store_dword [[PACKEDID]], off, s[0:3], s32{{$}}

; FIXEDABI-DAG: v_or_b32_e32 v31, [[TMP2]], [[TMP0]]		; FIXEDABI-DAG: v_or_b32_e32 v31, [[TMP2]], [[TMP0]]
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	define void @too_many_args_use_workitem_id_x_stack_yz(
store volatile i32 %arg30, i32 addrspace(1)* undef		store volatile i32 %arg30, i32 addrspace(1)* undef

ret void		ret void
}		}

; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x_stack_yz:		; GCN-LABEL: {{^}}kern_call_too_many_args_use_workitem_id_x_stack_yz:
; GCN: enable_vgpr_workitem_id = 2		; GCN: enable_vgpr_workitem_id = 2

; VARABI: s_mov_b32 s33, s7
; FIXEDABI: s_mov_b32 s33, s17

; GCN-NOT: v0		; GCN-NOT: v0
; GCN-DAG: v_lshlrev_b32_e32 v1, 10, v1		; GCN-DAG: v_lshlrev_b32_e32 v1, 10, v1
; GCN-DAG: v_or_b32_e32 v0, v0, v1		; GCN-DAG: v_or_b32_e32 v0, v0, v1
; GCN-DAG: v_lshlrev_b32_e32 v2, 20, v2		; GCN-DAG: v_lshlrev_b32_e32 v2, 20, v2
; GCN-DAG: v_or_b32_e32 v31, v0, v2		; GCN-DAG: v_or_b32_e32 v31, v0, v2

; GCN: s_mov_b32 s32, s33		; GCN: s_mov_b32 s32, 0
; GCN: s_swappc_b64		; GCN: s_swappc_b64
define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_x_stack_yz() #1 {		define amdgpu_kernel void @kern_call_too_many_args_use_workitem_id_x_stack_yz() #1 {
call void @too_many_args_use_workitem_id_x_stack_yz(		call void @too_many_args_use_workitem_id_x_stack_yz(
i32 10, i32 20, i32 30, i32 40,		i32 10, i32 20, i32 30, i32 40,
i32 50, i32 60, i32 70, i32 80,		i32 50, i32 60, i32 70, i32 80,
i32 90, i32 100, i32 110, i32 120,		i32 90, i32 100, i32 110, i32 120,
i32 130, i32 140, i32 150, i32 160,		i32 130, i32 140, i32 150, i32 160,
i32 170, i32 180, i32 190, i32 200,		i32 170, i32 180, i32 190, i32 200,
Show All 12 Lines

llvm/test/CodeGen/AMDGPU/captured-frame-index.ll

Show All 22 Lines	define amdgpu_kernel void @stored_fi_to_lds(float addrspace(5)* addrspace(3)* %ptr) #0 {
store float 4.0, float addrspace(5)*%tmp		store float 4.0, float addrspace(5)*%tmp
store float addrspace(5)* %tmp, float addrspace(5)* addrspace(3)* %ptr		store float addrspace(5)* %tmp, float addrspace(5)* addrspace(3)* %ptr
ret void		ret void
}		}

; Offset is applied		; Offset is applied
; GCN-LABEL: {{^}}stored_fi_to_lds_2_small_objects:		; GCN-LABEL: {{^}}stored_fi_to_lds_2_small_objects:
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 4{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 4{{$}}
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:8{{$}}		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8{{$}}

; GCN-DAG: s_load_dword [[LDSPTR:s[0-9]+]]		; GCN-DAG: s_load_dword [[LDSPTR:s[0-9]+]]

; GCN-DAG: v_mov_b32_e32 [[VLDSPTR:v[0-9]+]], [[LDSPTR]]		; GCN-DAG: v_mov_b32_e32 [[VLDSPTR:v[0-9]+]], [[LDSPTR]]
; GCN: ds_write_b32 [[VLDSPTR]], [[ZERO]]		; GCN: ds_write_b32 [[VLDSPTR]], [[ZERO]]

; GCN-DAG: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}		; GCN-DAG: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}
; GCN: ds_write_b32 [[VLDSPTR]], [[FI1]]		; GCN: ds_write_b32 [[VLDSPTR]], [[FI1]]
define amdgpu_kernel void @stored_fi_to_lds_2_small_objects(float addrspace(5)* addrspace(3)* %ptr) #0 {		define amdgpu_kernel void @stored_fi_to_lds_2_small_objects(float addrspace(5)* addrspace(3)* %ptr) #0 {
%tmp0 = alloca float, addrspace(5)		%tmp0 = alloca float, addrspace(5)
%tmp1 = alloca float, addrspace(5)		%tmp1 = alloca float, addrspace(5)
store float 4.0, float addrspace(5)* %tmp0		store float 4.0, float addrspace(5)* %tmp0
store float 4.0, float addrspace(5)* %tmp1		store float 4.0, float addrspace(5)* %tmp1
store volatile float addrspace(5)* %tmp0, float addrspace(5)* addrspace(3)* %ptr		store volatile float addrspace(5)* %tmp0, float addrspace(5)* addrspace(3)* %ptr
store volatile float addrspace(5)* %tmp1, float addrspace(5)* addrspace(3)* %ptr		store volatile float addrspace(5)* %tmp1, float addrspace(5)* addrspace(3)* %ptr
ret void		ret void
}		}

; Same frame index is used multiple times in the store		; Same frame index is used multiple times in the store
; GCN-LABEL: {{^}}stored_fi_to_self:		; GCN-LABEL: {{^}}stored_fi_to_self:
; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x4d2{{$}}		; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x4d2{{$}}
; GCN: buffer_store_dword [[K]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword [[K]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 4{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 4{{$}}
; GCN: buffer_store_dword [[ZERO]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword [[ZERO]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
define amdgpu_kernel void @stored_fi_to_self() #0 {		define amdgpu_kernel void @stored_fi_to_self() #0 {
%tmp = alloca i32 addrspace(5)*, addrspace(5)		%tmp = alloca i32 addrspace(5)*, addrspace(5)

; Avoid optimizing everything out		; Avoid optimizing everything out
store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp		store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp
%bitcast = bitcast i32 addrspace(5)* addrspace(5)* %tmp to i32 addrspace(5)*		%bitcast = bitcast i32 addrspace(5)* addrspace(5)* %tmp to i32 addrspace(5)*
store volatile i32 addrspace(5)* %bitcast, i32 addrspace(5)* addrspace(5)* %tmp		store volatile i32 addrspace(5)* %bitcast, i32 addrspace(5)* addrspace(5)* %tmp
ret void		ret void
}		}

; GCN-LABEL: {{^}}stored_fi_to_self_offset:		; GCN-LABEL: {{^}}stored_fi_to_self_offset:
; GCN-DAG: v_mov_b32_e32 [[K0:v[0-9]+]], 32{{$}}		; GCN-DAG: v_mov_b32_e32 [[K0:v[0-9]+]], 32{{$}}
; GCN: buffer_store_dword [[K0]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword [[K0]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}

; GCN-DAG: v_mov_b32_e32 [[K1:v[0-9]+]], 0x4d2{{$}}		; GCN-DAG: v_mov_b32_e32 [[K1:v[0-9]+]], 0x4d2{{$}}
; GCN: buffer_store_dword [[K1]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:2052{{$}}		; GCN: buffer_store_dword [[K1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2052{{$}}

; GCN: v_mov_b32_e32 [[OFFSETK:v[0-9]+]], 0x804{{$}}		; GCN: v_mov_b32_e32 [[OFFSETK:v[0-9]+]], 0x804{{$}}
; GCN: buffer_store_dword [[OFFSETK]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:2052{{$}}		; GCN: buffer_store_dword [[OFFSETK]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:2052{{$}}
define amdgpu_kernel void @stored_fi_to_self_offset() #0 {		define amdgpu_kernel void @stored_fi_to_self_offset() #0 {
%tmp0 = alloca [512 x i32], addrspace(5)		%tmp0 = alloca [512 x i32], addrspace(5)
%tmp1 = alloca i32 addrspace(5)*, addrspace(5)		%tmp1 = alloca i32 addrspace(5)*, addrspace(5)

; Avoid optimizing everything out		; Avoid optimizing everything out
%tmp0.cast = bitcast [512 x i32] addrspace(5)* %tmp0 to i32 addrspace(5)*		%tmp0.cast = bitcast [512 x i32] addrspace(5)* %tmp0 to i32 addrspace(5)*
store volatile i32 32, i32 addrspace(5)* %tmp0.cast		store volatile i32 32, i32 addrspace(5)* %tmp0.cast

store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp1		store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp1

%bitcast = bitcast i32 addrspace(5)* addrspace(5)* %tmp1 to i32 addrspace(5)*		%bitcast = bitcast i32 addrspace(5)* addrspace(5)* %tmp1 to i32 addrspace(5)*
store volatile i32 addrspace(5)* %bitcast, i32 addrspace(5)* addrspace(5)* %tmp1		store volatile i32 addrspace(5)* %bitcast, i32 addrspace(5)* addrspace(5)* %tmp1
ret void		ret void
}		}

; GCN-LABEL: {{^}}stored_fi_to_fi:		; GCN-LABEL: {{^}}stored_fi_to_fi:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:8{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12{{$}}

; GCN: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}		; GCN: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}
; GCN: buffer_store_dword [[FI1]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12{{$}}		; GCN: buffer_store_dword [[FI1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12{{$}}

; GCN: v_mov_b32_e32 [[FI2:v[0-9]+]], 12{{$}}		; GCN: v_mov_b32_e32 [[FI2:v[0-9]+]], 12{{$}}
; GCN: buffer_store_dword [[FI2]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:8{{$}}		; GCN: buffer_store_dword [[FI2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8{{$}}
define amdgpu_kernel void @stored_fi_to_fi() #0 {		define amdgpu_kernel void @stored_fi_to_fi() #0 {
%tmp0 = alloca i32 addrspace(5)*, addrspace(5)		%tmp0 = alloca i32 addrspace(5)*, addrspace(5)
%tmp1 = alloca i32 addrspace(5)*, addrspace(5)		%tmp1 = alloca i32 addrspace(5)*, addrspace(5)
%tmp2 = alloca i32 addrspace(5)*, addrspace(5)		%tmp2 = alloca i32 addrspace(5)*, addrspace(5)
store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp0		store volatile i32 addrspace(5)* inttoptr (i32 1234 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp0
store volatile i32 addrspace(5)* inttoptr (i32 5678 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp1		store volatile i32 addrspace(5)* inttoptr (i32 5678 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp1
store volatile i32 addrspace(5)* inttoptr (i32 9999 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp2		store volatile i32 addrspace(5)* inttoptr (i32 9999 to i32 addrspace(5)), i32 addrspace(5) addrspace(5)* %tmp2

%bitcast1 = bitcast i32 addrspace(5)* addrspace(5)* %tmp1 to i32 addrspace(5)*		%bitcast1 = bitcast i32 addrspace(5)* addrspace(5)* %tmp1 to i32 addrspace(5)*
%bitcast2 = bitcast i32 addrspace(5)* addrspace(5)* %tmp2 to i32 addrspace(5)* ; at offset 8		%bitcast2 = bitcast i32 addrspace(5)* addrspace(5)* %tmp2 to i32 addrspace(5)* ; at offset 8

store volatile i32 addrspace(5)* %bitcast1, i32 addrspace(5)* addrspace(5)* %tmp2 ; store offset 4 at offset 8		store volatile i32 addrspace(5)* %bitcast1, i32 addrspace(5)* addrspace(5)* %tmp2 ; store offset 4 at offset 8
store volatile i32 addrspace(5)* %bitcast2, i32 addrspace(5)* addrspace(5)* %tmp1 ; store offset 8 at offset 4		store volatile i32 addrspace(5)* %bitcast2, i32 addrspace(5)* addrspace(5)* %tmp1 ; store offset 8 at offset 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}stored_fi_to_global:		; GCN-LABEL: {{^}}stored_fi_to_global:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 4{{$}}		; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 4{{$}}
; GCN: buffer_store_dword [[FI]]		; GCN: buffer_store_dword [[FI]]
define amdgpu_kernel void @stored_fi_to_global(float addrspace(5)* addrspace(1)* %ptr) #0 {		define amdgpu_kernel void @stored_fi_to_global(float addrspace(5)* addrspace(1)* %ptr) #0 {
%tmp = alloca float, addrspace(5)		%tmp = alloca float, addrspace(5)
store float 0.0, float addrspace(5)*%tmp		store float 0.0, float addrspace(5)*%tmp
store float addrspace(5)* %tmp, float addrspace(5)* addrspace(1)* %ptr		store float addrspace(5)* %tmp, float addrspace(5)* addrspace(1)* %ptr
ret void		ret void
}		}

; Offset is applied		; Offset is applied
; GCN-LABEL: {{^}}stored_fi_to_global_2_small_objects:		; GCN-LABEL: {{^}}stored_fi_to_global_2_small_objects:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:8{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:8{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12{{$}}

; GCN: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}		; GCN: v_mov_b32_e32 [[FI1:v[0-9]+]], 8{{$}}
; GCN: buffer_store_dword [[FI1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}		; GCN: buffer_store_dword [[FI1]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}

; GCN-DAG: v_mov_b32_e32 [[FI2:v[0-9]+]], 12{{$}}		; GCN-DAG: v_mov_b32_e32 [[FI2:v[0-9]+]], 12{{$}}
; GCN: buffer_store_dword [[FI2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}		; GCN: buffer_store_dword [[FI2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
define amdgpu_kernel void @stored_fi_to_global_2_small_objects(float addrspace(5)* addrspace(1)* %ptr) #0 {		define amdgpu_kernel void @stored_fi_to_global_2_small_objects(float addrspace(5)* addrspace(1)* %ptr) #0 {
%tmp0 = alloca float, addrspace(5)		%tmp0 = alloca float, addrspace(5)
%tmp1 = alloca float, addrspace(5)		%tmp1 = alloca float, addrspace(5)
%tmp2 = alloca float, addrspace(5)		%tmp2 = alloca float, addrspace(5)
store volatile float 0.0, float addrspace(5)*%tmp0		store volatile float 0.0, float addrspace(5)*%tmp0
store volatile float 0.0, float addrspace(5)*%tmp1		store volatile float 0.0, float addrspace(5)*%tmp1
store volatile float 0.0, float addrspace(5)*%tmp2		store volatile float 0.0, float addrspace(5)*%tmp2
store volatile float addrspace(5)* %tmp1, float addrspace(5)* addrspace(1)* %ptr		store volatile float addrspace(5)* %tmp1, float addrspace(5)* addrspace(1)* %ptr
store volatile float addrspace(5)* %tmp2, float addrspace(5)* addrspace(1)* %ptr		store volatile float addrspace(5)* %tmp2, float addrspace(5)* addrspace(1)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}stored_fi_to_global_huge_frame_offset:		; GCN-LABEL: {{^}}stored_fi_to_global_huge_frame_offset:
; GCN: v_mov_b32_e32 [[BASE_0:v[0-9]+]], 0{{$}}		; GCN: v_mov_b32_e32 [[BASE_0:v[0-9]+]], 0{{$}}
; GCN: buffer_store_dword [[BASE_0]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword [[BASE_0]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}

; FIXME: Re-initialize		; FIXME: Re-initialize
; GCN: v_mov_b32_e32 [[BASE_0_1:v[0-9]+]], 4{{$}}		; GCN: v_mov_b32_e32 [[BASE_0_1:v[0-9]+]], 4{{$}}

; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}		; GCN-DAG: v_mov_b32_e32 [[K:v[0-9]+]], 0x3e7{{$}}
; GCN-DAG: v_add_i32_e32 [[BASE_1_OFF_1:v[0-9]+]], vcc, 0x3ffc, [[BASE_0_1]]		; GCN-DAG: v_add_i32_e32 [[BASE_1_OFF_1:v[0-9]+]], vcc, 0x3ffc, [[BASE_0_1]]


; GCN: v_add_i32_e32 [[BASE_1_OFF_2:v[0-9]+]], vcc, 56, [[BASE_0_1]]		; GCN: v_add_i32_e32 [[BASE_1_OFF_2:v[0-9]+]], vcc, 56, [[BASE_0_1]]
; GCN: buffer_store_dword [[K]], [[BASE_1_OFF_1]], s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword [[K]], [[BASE_1_OFF_1]], s{{\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}

; GCN: buffer_store_dword [[BASE_1_OFF_2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}		; GCN: buffer_store_dword [[BASE_1_OFF_2]], off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
define amdgpu_kernel void @stored_fi_to_global_huge_frame_offset(i32 addrspace(5)* addrspace(1)* %ptr) #0 {		define amdgpu_kernel void @stored_fi_to_global_huge_frame_offset(i32 addrspace(5)* addrspace(1)* %ptr) #0 {
%tmp0 = alloca [4096 x i32], addrspace(5)		%tmp0 = alloca [4096 x i32], addrspace(5)
%tmp1 = alloca [4096 x i32], addrspace(5)		%tmp1 = alloca [4096 x i32], addrspace(5)
%gep0.tmp0 = getelementptr [4096 x i32], [4096 x i32] addrspace(5)* %tmp0, i32 0, i32 0		%gep0.tmp0 = getelementptr [4096 x i32], [4096 x i32] addrspace(5)* %tmp0, i32 0, i32 0
store volatile i32 0, i32 addrspace(5)* %gep0.tmp0		store volatile i32 0, i32 addrspace(5)* %gep0.tmp0
%gep1.tmp0 = getelementptr [4096 x i32], [4096 x i32] addrspace(5)* %tmp0, i32 0, i32 4095		%gep1.tmp0 = getelementptr [4096 x i32], [4096 x i32] addrspace(5)* %tmp0, i32 0, i32 4095
Show All 32 Lines

llvm/test/CodeGen/AMDGPU/cc-update.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 < %s \| FileCheck --check-prefix=GFX803 %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck --check-prefix=GFX900 %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 < %s \| FileCheck --check-prefix=GFX1010 %s

				define amdgpu_kernel void @test_kern_empty() local_unnamed_addr #0 {
				; GFX803-LABEL: test_kern_empty:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_kern_empty:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_kern_empty:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_endpgm
				entry:
				ret void
				}

				define amdgpu_kernel void @test_kern_stack() local_unnamed_addr #0 {
				; GFX803-LABEL: test_kern_stack:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: v_mov_b32_e32 v0, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_kern_stack:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: v_mov_b32_e32 v0, 0
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_kern_stack:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: v_mov_b32_e32 v0, 0
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX1010-NEXT: s_endpgm
				entry:
				%x = alloca i32, align 4, addrspace(5)
				store volatile i32 0, i32 addrspace(5)* %x, align 4
				ret void
				}

				define amdgpu_kernel void @test_kern_call() local_unnamed_addr #0 {
				; GFX803-LABEL: test_kern_call:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: s_getpc_b64 s[4:5]
				; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX803-NEXT: s_mov_b32 s32, 0
				; GFX803-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_kern_call:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: s_getpc_b64 s[4:5]
				; GFX900-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX900-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX900-NEXT: s_mov_b32 s32, 0
				; GFX900-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_kern_call:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_mov_b32 s32, 0
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: s_getpc_b64 s[4:5]
				; GFX1010-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX1010-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX1010-NEXT: s_endpgm
				entry:
				tail call void @ex() #0
				ret void
				}

				define amdgpu_kernel void @test_kern_stack_and_call() local_unnamed_addr #0 {
				; GFX803-LABEL: test_kern_stack_and_call:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: v_mov_b32_e32 v0, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: s_getpc_b64 s[4:5]
				; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX803-NEXT: s_movk_i32 s32, 0x400
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX803-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_kern_stack_and_call:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: v_mov_b32_e32 v0, 0
				; GFX900-NEXT: s_getpc_b64 s[4:5]
				; GFX900-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX900-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX900-NEXT: s_movk_i32 s32, 0x400
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX900-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_kern_stack_and_call:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_movk_i32 s32, 0x200
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: v_mov_b32_e32 v0, 0
				; GFX1010-NEXT: s_getpc_b64 s[4:5]
				; GFX1010-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX1010-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
				; GFX1010-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX1010-NEXT: s_endpgm
				entry:
				%x = alloca i32, align 4, addrspace(5)
				store volatile i32 0, i32 addrspace(5)* %x, align 4
				tail call void @ex() #0
				ret void
				}

				define amdgpu_kernel void @test_force_fp_kern_empty() local_unnamed_addr #2 {
				; GFX803-LABEL: test_force_fp_kern_empty:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_mov_b32 s34, 0
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_force_fp_kern_empty:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_mov_b32 s34, 0
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_force_fp_kern_empty:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_mov_b32 s34, 0
				; GFX1010-NEXT: s_endpgm
				entry:
				ret void
				}

				define amdgpu_kernel void @test_force_fp_kern_stack() local_unnamed_addr #2 {
				; GFX803-LABEL: test_force_fp_kern_stack:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_mov_b32 s34, 0
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: v_mov_b32_e32 v0, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s34 offset:4
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_force_fp_kern_stack:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_mov_b32 s34, 0
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: v_mov_b32_e32 v0, 0
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], s34 offset:4
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_force_fp_kern_stack:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_mov_b32 s34, 0
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: v_mov_b32_e32 v0, 0
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], s34 offset:4
				; GFX1010-NEXT: s_endpgm
				entry:
				%x = alloca i32, align 4, addrspace(5)
				store volatile i32 0, i32 addrspace(5)* %x, align 4
				ret void
				}

				define amdgpu_kernel void @test_force_fp_kern_call() local_unnamed_addr #2 {
				; GFX803-LABEL: test_force_fp_kern_call:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: s_getpc_b64 s[4:5]
				; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX803-NEXT: s_mov_b32 s32, 0
				; GFX803-NEXT: s_mov_b32 s34, 0
				; GFX803-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_force_fp_kern_call:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: s_getpc_b64 s[4:5]
				; GFX900-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX900-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX900-NEXT: s_mov_b32 s32, 0
				; GFX900-NEXT: s_mov_b32 s34, 0
				; GFX900-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_force_fp_kern_call:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_mov_b32 s32, 0
				; GFX1010-NEXT: s_mov_b32 s34, 0
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: s_getpc_b64 s[4:5]
				; GFX1010-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX1010-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX1010-NEXT: s_endpgm
				entry:
				tail call void @ex() #2
				ret void
				}

				define amdgpu_kernel void @test_force_fp_kern_stack_and_call() local_unnamed_addr #2 {
				; GFX803-LABEL: test_force_fp_kern_stack_and_call:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_mov_b32 s34, 0
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: v_mov_b32_e32 v0, 0
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: s_getpc_b64 s[4:5]
				; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX803-NEXT: s_movk_i32 s32, 0x400
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s34 offset:4
				; GFX803-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_force_fp_kern_stack_and_call:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: s_mov_b32 s34, 0
				; GFX900-NEXT: v_mov_b32_e32 v0, 0
				; GFX900-NEXT: s_getpc_b64 s[4:5]
				; GFX900-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX900-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX900-NEXT: s_movk_i32 s32, 0x400
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], s34 offset:4
				; GFX900-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_force_fp_kern_stack_and_call:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_movk_i32 s32, 0x200
				; GFX1010-NEXT: s_mov_b32 s34, 0
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: v_mov_b32_e32 v0, 0
				; GFX1010-NEXT: s_getpc_b64 s[4:5]
				; GFX1010-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
				; GFX1010-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+4
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], s34 offset:4
				; GFX1010-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; GFX1010-NEXT: s_endpgm
				entry:
				%x = alloca i32, align 4, addrspace(5)
				store volatile i32 0, i32 addrspace(5)* %x, align 4
				tail call void @ex() #2
				ret void
				}

				define amdgpu_kernel void @test_sgpr_offset_kernel() #1 {
				; GFX803-LABEL: test_sgpr_offset_kernel:
				; GFX803: ; %bb.0: ; %entry
				; GFX803-NEXT: s_add_u32 s4, s4, s7
				; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
				; GFX803-NEXT: s_add_u32 s0, s0, s7
				; GFX803-NEXT: s_addc_u32 s1, s1, 0
				; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8
				; GFX803-NEXT: s_mov_b32 s4, 0x40000
				; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
				; GFX803-NEXT: s_waitcnt vmcnt(0)
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill
				; GFX803-NEXT: ;;#ASMSTART
				; GFX803-NEXT: ;;#ASMEND
				; GFX803-NEXT: s_mov_b32 s4, 0x40000
				; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
				; GFX803-NEXT: s_waitcnt vmcnt(0)
				; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
				; GFX803-NEXT: s_endpgm
				;
				; GFX900-LABEL: test_sgpr_offset_kernel:
				; GFX900: ; %bb.0: ; %entry
				; GFX900-NEXT: s_add_u32 flat_scratch_lo, s4, s7
				; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
				; GFX900-NEXT: s_add_u32 s0, s0, s7
				; GFX900-NEXT: s_addc_u32 s1, s1, 0
				; GFX900-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8
				; GFX900-NEXT: s_mov_b32 s6, 0x40000
				; GFX900-NEXT: s_waitcnt vmcnt(0)
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
				; GFX900-NEXT: ;;#ASMSTART
				; GFX900-NEXT: ;;#ASMEND
				; GFX900-NEXT: s_mov_b32 s6, 0x40000
				; GFX900-NEXT: buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
				; GFX900-NEXT: s_waitcnt vmcnt(0)
				; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
				; GFX900-NEXT: s_endpgm
				;
				; GFX1010-LABEL: test_sgpr_offset_kernel:
				; GFX1010: ; %bb.0: ; %entry
				; GFX1010-NEXT: s_add_u32 s4, s4, s7
				; GFX1010-NEXT: s_addc_u32 s5, s5, 0
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
				; GFX1010-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
				; GFX1010-NEXT: s_add_u32 s0, s0, s7
				; GFX1010-NEXT: s_addc_u32 s1, s1, 0
				; GFX1010-NEXT: s_mov_b32 s6, 0x20000
				; GFX1010-NEXT: ; implicit-def: $vcc_hi
				; GFX1010-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8
				; GFX1010-NEXT: s_waitcnt vmcnt(0)
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
				; GFX1010-NEXT: v_nop
				; GFX1010-NEXT: s_mov_b32 s6, 0x20000
				; GFX1010-NEXT: ;;#ASMSTART
				; GFX1010-NEXT: ;;#ASMEND
				; GFX1010-NEXT: buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
				; GFX1010-NEXT: s_waitcnt vmcnt(0)
				; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
				; GFX1010-NEXT: s_endpgm
				entry:
				; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
				; fit in the instruction, and has to live in the SGPR offset.
				%alloca = alloca i8, i32 4092, align 4, addrspace(5)
				%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

				%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
				; 0x40000 / 64 = 4096 (for wave64)
				; CHECK: s_add_u32 s6, s7, 0x40000
				; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill
				%a = load volatile i32, i32 addrspace(5)* %aptr

				; Force %a to spill
				call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

				%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
				store volatile i32 %a, i32 addrspace(5)* %outptr

				ret void
				}

				declare hidden void @ex() local_unnamed_addr #0

				attributes #0 = { nounwind }
				attributes #1 = { nounwind "amdgpu-num-vgpr"="8" }
				attributes #2 = { nounwind "frame-pointer"="all" }

llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll

	Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines

	; OPT-LABEL: @test_sink_scratch_small_offset_i32(			; OPT-LABEL: @test_sink_scratch_small_offset_i32(
	; OPT-NOT: getelementptr [512 x i32]			; OPT-NOT: getelementptr [512 x i32]
	; OPT: br i1			; OPT: br i1
	; OPT: getelementptr i8,			; OPT: getelementptr i8,

	; GCN-LABEL: {{^}}test_sink_scratch_small_offset_i32:			; GCN-LABEL: {{^}}test_sink_scratch_small_offset_i32:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offset:4092{{$}}			; GCN: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:4092{{$}}
	; GCN: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offset:4092{{$}}			; GCN: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:4092{{$}}
	; GCN: {{^}}BB4_2:			; GCN: {{^}}BB4_2:
	define amdgpu_kernel void @test_sink_scratch_small_offset_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {			define amdgpu_kernel void @test_sink_scratch_small_offset_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {
	entry:			entry:
	%alloca = alloca [512 x i32], align 4, addrspace(5)			%alloca = alloca [512 x i32], align 4, addrspace(5)
	%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998			%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998
	%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999			%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999
	%add.arg = add i32 %arg, 8			%add.arg = add i32 %arg, 8
	%alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1022			%alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1022
	Show All 21 Lines
	; OPT-LABEL: @test_sink_scratch_small_offset_i32_reserved(			; OPT-LABEL: @test_sink_scratch_small_offset_i32_reserved(
	; OPT-NOT: getelementptr [512 x i32]			; OPT-NOT: getelementptr [512 x i32]
	; OPT: br i1			; OPT: br i1
	; OPT: getelementptr i8,			; OPT: getelementptr i8,

	; GCN-LABEL: {{^}}test_sink_scratch_small_offset_i32_reserved:			; GCN-LABEL: {{^}}test_sink_scratch_small_offset_i32_reserved:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN: v_mov_b32_e32 [[BASE_FI0:v[0-9]+]], 4			; GCN: v_mov_b32_e32 [[BASE_FI0:v[0-9]+]], 4
	; GCN: buffer_store_dword {{v[0-9]+}}, [[BASE_FI0]], {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen offset:4092{{$}}			; GCN: buffer_store_dword {{v[0-9]+}}, [[BASE_FI0]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen offset:4092{{$}}
	; GCN: v_mov_b32_e32 [[BASE_FI1:v[0-9]+]], 4			; GCN: v_mov_b32_e32 [[BASE_FI1:v[0-9]+]], 4
	; GCN: buffer_load_dword {{v[0-9]+}}, [[BASE_FI1]], {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen offset:4092{{$}}			; GCN: buffer_load_dword {{v[0-9]+}}, [[BASE_FI1]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen offset:4092{{$}}
	; GCN: {{^BB[0-9]+}}_2:			; GCN: {{^BB[0-9]+}}_2:

	define amdgpu_kernel void @test_sink_scratch_small_offset_i32_reserved(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {			define amdgpu_kernel void @test_sink_scratch_small_offset_i32_reserved(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {
	entry:			entry:
	%alloca = alloca [512 x i32], align 4, addrspace(5)			%alloca = alloca [512 x i32], align 4, addrspace(5)
	%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998			%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998
	%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999			%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999
	%add.arg = add i32 %arg, 8			%add.arg = add i32 %arg, 8
	Show All 20 Lines

	; OPT-LABEL: @test_no_sink_scratch_large_offset_i32(			; OPT-LABEL: @test_no_sink_scratch_large_offset_i32(
	; OPT: %alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1024			; OPT: %alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1024
	; OPT: br i1			; OPT: br i1
	; OPT-NOT: ptrtoint			; OPT-NOT: ptrtoint

	; GCN-LABEL: {{^}}test_no_sink_scratch_large_offset_i32:			; GCN-LABEL: {{^}}test_no_sink_scratch_large_offset_i32:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen{{$}}			; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen{{$}}			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}
	; GCN: {{^BB[0-9]+}}_2:			; GCN: {{^BB[0-9]+}}_2:
	define amdgpu_kernel void @test_no_sink_scratch_large_offset_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {			define amdgpu_kernel void @test_no_sink_scratch_large_offset_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %arg) {
	entry:			entry:
	%alloca = alloca [512 x i32], align 4, addrspace(5)			%alloca = alloca [512 x i32], align 4, addrspace(5)
	%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998			%out.gep.0 = getelementptr i32, i32 addrspace(1)* %out, i64 999998
	%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999			%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i64 999999
	%add.arg = add i32 %arg, 8			%add.arg = add i32 %arg, 8
	%alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1024			%alloca.gep = getelementptr [512 x i32], [512 x i32] addrspace(5)* %alloca, i32 0, i32 1024
	▲ Show 20 Lines • Show All 532 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s

define <2 x half> @chain_hi_to_lo_private() {		define <2 x half> @chain_hi_to_lo_private() {
; GCN-LABEL: chain_hi_to_lo_private:		; GCN-LABEL: chain_hi_to_lo_private:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:2		; GCN-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], s33		; GCN-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], 0
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%gep_lo = getelementptr inbounds half, half addrspace(5)* null, i64 1		%gep_lo = getelementptr inbounds half, half addrspace(5)* null, i64 1
%load_lo = load half, half addrspace(5)* %gep_lo		%load_lo = load half, half addrspace(5)* %gep_lo
%gep_hi = getelementptr inbounds half, half addrspace(5)* null, i64 0		%gep_hi = getelementptr inbounds half, half addrspace(5)* null, i64 0
%load_hi = load half, half addrspace(5)* %gep_hi		%load_hi = load half, half addrspace(5)* %gep_hi

%temp = insertelement <2 x half> undef, half %load_lo, i32 0		%temp = insertelement <2 x half> undef, half %load_lo, i32 0
%result = insertelement <2 x half> %temp, half %load_hi, i32 1		%result = insertelement <2 x half> %temp, half %load_hi, i32 1

ret <2 x half> %result		ret <2 x half> %result
}		}

define <2 x half> @chain_hi_to_lo_private_different_bases(half addrspace(5)* %base_lo, half addrspace(5)* %base_hi) {		define <2 x half> @chain_hi_to_lo_private_different_bases(half addrspace(5)* %base_lo, half addrspace(5)* %base_hi) {
; GCN-LABEL: chain_hi_to_lo_private_different_bases:		; GCN-LABEL: chain_hi_to_lo_private_different_bases:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_ushort v0, v0, s[0:3], s33 offen		; GCN-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v0, v1, s[0:3], s33 offen		; GCN-NEXT: buffer_load_short_d16_hi v0, v1, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%load_lo = load half, half addrspace(5)* %base_lo		%load_lo = load half, half addrspace(5)* %base_lo
%load_hi = load half, half addrspace(5)* %base_hi		%load_hi = load half, half addrspace(5)* %base_hi

%temp = insertelement <2 x half> undef, half %load_lo, i32 0		%temp = insertelement <2 x half> undef, half %load_lo, i32 0
%result = insertelement <2 x half> %temp, half %load_hi, i32 1		%result = insertelement <2 x half> %temp, half %load_hi, i32 1

ret <2 x half> %result		ret <2 x half> %result
}		}

define <2 x half> @chain_hi_to_lo_arithmatic(half addrspace(5)* %base, half %in) {		define <2 x half> @chain_hi_to_lo_arithmatic(half addrspace(5)* %base, half %in) {
; GCN-LABEL: chain_hi_to_lo_arithmatic:		; GCN-LABEL: chain_hi_to_lo_arithmatic:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_add_f16_e32 v1, 1.0, v1		; GCN-NEXT: v_add_f16_e32 v1, 1.0, v1
; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], s33 offen		; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, v1		; GCN-NEXT: v_mov_b32_e32 v0, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%arith_lo = fadd half %in, 1.0		%arith_lo = fadd half %in, 1.0
%load_hi = load half, half addrspace(5)* %base		%load_hi = load half, half addrspace(5)* %base

%temp = insertelement <2 x half> undef, half %arith_lo, i32 0		%temp = insertelement <2 x half> undef, half %arith_lo, i32 0
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines

; Make sure we don't lose any of the private stores.		; Make sure we don't lose any of the private stores.
define amdgpu_kernel void @vload2_private(i16 addrspace(1)* nocapture readonly %in, <2 x i16> addrspace(1)* nocapture %out) #0 {		define amdgpu_kernel void @vload2_private(i16 addrspace(1)* nocapture readonly %in, <2 x i16> addrspace(1)* nocapture %out) #0 {
; GCN-LABEL: vload2_private:		; GCN-LABEL: vload2_private:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9		; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0		; GCN-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
		; GCN-NEXT: s_add_u32 s0, s0, s9
		; GCN-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v2, s4		; GCN-NEXT: v_mov_b32_e32 v2, s4
; GCN-NEXT: v_mov_b32_e32 v3, s5		; GCN-NEXT: v_mov_b32_e32 v3, s5
; GCN-NEXT: global_load_ushort v4, v[2:3], off		; GCN-NEXT: global_load_ushort v4, v[2:3], off
; GCN-NEXT: v_mov_b32_e32 v0, s6		; GCN-NEXT: v_mov_b32_e32 v0, s6
; GCN-NEXT: v_mov_b32_e32 v1, s7		; GCN-NEXT: v_mov_b32_e32 v1, s7
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v4, off, s[0:3], s9 offset:4		; GCN-NEXT: buffer_store_short v4, off, s[0:3], 0 offset:4
; GCN-NEXT: global_load_ushort v4, v[2:3], off offset:2		; GCN-NEXT: global_load_ushort v4, v[2:3], off offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v4, off, s[0:3], s9 offset:6		; GCN-NEXT: buffer_store_short v4, off, s[0:3], 0 offset:6
; GCN-NEXT: global_load_ushort v2, v[2:3], off offset:4		; GCN-NEXT: global_load_ushort v2, v[2:3], off offset:4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v2, off, s[0:3], s9 offset:8		; GCN-NEXT: buffer_store_short v2, off, s[0:3], 0 offset:8
; GCN-NEXT: buffer_load_ushort v2, off, s[0:3], s9 offset:4		; GCN-NEXT: buffer_load_ushort v2, off, s[0:3], 0 offset:4
; GCN-NEXT: buffer_load_ushort v4, off, s[0:3], s9 offset:6		; GCN-NEXT: buffer_load_ushort v4, off, s[0:3], 0 offset:6
; GCN-NEXT: s_waitcnt vmcnt(1)		; GCN-NEXT: s_waitcnt vmcnt(1)
; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v3, v4		; GCN-NEXT: v_mov_b32_e32 v3, v4
; GCN-NEXT: buffer_load_short_d16_hi v3, off, s[0:3], s9 offset:8		; GCN-NEXT: buffer_load_short_d16_hi v3, off, s[0:3], 0 offset:8
; GCN-NEXT: v_lshl_or_b32 v2, v4, 16, v2		; GCN-NEXT: v_lshl_or_b32 v2, v4, 16, v2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: global_store_dwordx2 v[0:1], v[2:3], off		; GCN-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
entry:		entry:
%loc = alloca [3 x i16], align 2, addrspace(5)		%loc = alloca [3 x i16], align 2, addrspace(5)
%loc.0.sroa_cast1 = bitcast [3 x i16] addrspace(5)* %loc to i8 addrspace(5)*		%loc.0.sroa_cast1 = bitcast [3 x i16] addrspace(5)* %loc to i8 addrspace(5)*
%tmp = load i16, i16 addrspace(1)* %in, align 2		%tmp = load i16, i16 addrspace(1)* %in, align 2
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	bb:
%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0		%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0
ret <2 x i16> %result		ret <2 x i16> %result
}		}

define <2 x i16> @chain_hi_to_lo_private_other_dep(i16 addrspace(5)* %ptr) {		define <2 x i16> @chain_hi_to_lo_private_other_dep(i16 addrspace(5)* %ptr) {
; GCN-LABEL: chain_hi_to_lo_private_other_dep:		; GCN-LABEL: chain_hi_to_lo_private_other_dep:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], s33 offen		; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_pk_sub_u16 v1, v1, -12 op_sel_hi:[1,0]		; GCN-NEXT: v_pk_sub_u16 v1, v1, -12 op_sel_hi:[1,0]
; GCN-NEXT: buffer_load_short_d16 v1, v0, s[0:3], s33 offen offset:2		; GCN-NEXT: buffer_load_short_d16 v1, v0, s[0:3], 0 offen offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, v1		; GCN-NEXT: v_mov_b32_e32 v0, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%gep_lo = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 1		%gep_lo = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 1
%load_lo = load i16, i16 addrspace(5)* %gep_lo		%load_lo = load i16, i16 addrspace(5)* %gep_lo
%gep_hi = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 0		%gep_hi = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 0
%load_hi = load i16, i16 addrspace(5)* %gep_hi		%load_hi = load i16, i16 addrspace(5)* %gep_hi
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

	Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}			; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
	; GCN: s_andn2_b64			; GCN: s_andn2_b64
	; GCN-NEXT: s_cbranch_execz			; GCN-NEXT: s_cbranch_execz

	; GCN: [[BB1_LOOP:BB[0-9]+_[0-9]+]]:			; GCN: [[BB1_LOOP:BB[0-9]+_[0-9]+]]:
	; GCN: s_andn2_b64 exec, exec,			; GCN: s_andn2_b64 exec, exec,
	; GCN-NEXT: s_cbranch_execnz [[BB1_LOOP]]			; GCN-NEXT: s_cbranch_execnz [[BB1_LOOP]]

	; GCN: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen			; GCN: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, 0 offen

	; GCN: s_and_saveexec_b64 [[SAVEEXEC_OUTER]], {{vcc\|s\[[0-9:]+\]}}			; GCN: s_and_saveexec_b64 [[SAVEEXEC_OUTER]], {{vcc\|s\[[0-9:]+\]}}
	; GCN-NEXT: s_cbranch_execz [[BB1_OUTER_LOOP]]			; GCN-NEXT: s_cbranch_execz [[BB1_OUTER_LOOP]]

	; GCN-NOT: s_or_b64 exec, exec			; GCN-NOT: s_or_b64 exec, exec

	; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}			; GCN: s_or_b64 exec, exec, s{{\[[0-9]+:[0-9]+\]}}
	; GCN: buffer_store_dword			; GCN: buffer_store_dword
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

	Show All 16 Lines
	; GCN: s_mov_b32 m0, -1			; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]

	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0
	; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec			; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec
	; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}, [[CMP0]]			; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}, [[CMP0]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s7 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]

	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], s7 offset:20 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], 0 offset:20 ; 4-byte Folded Spill
	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], s7 offset:24 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], 0 offset:24 ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}			; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}

	; GCN: s_cbranch_execz [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN: s_cbranch_execz [[ENDIF:BB[0-9]+_[0-9]+]]

	; GCN: ; %bb.{{[0-9]+}}: ; %if			; GCN: ; %bb.{{[0-9]+}}: ; %if
	; GCN: s_mov_b32 m0, -1			; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD1:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD1:v[0-9]+]]
	; GCN: buffer_load_dword [[RELOAD_LOAD0:v[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[RELOAD_LOAD0:v[0-9]+]], off, s[0:3], 0 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) lgkmcnt(0)


	; Spill val register			; Spill val register
	; GCN: v_add_i32_e32 [[VAL:v[0-9]+]], vcc, [[LOAD1]], [[RELOAD_LOAD0]]			; GCN: v_add_i32_e32 [[VAL:v[0-9]+]], vcc, [[LOAD1]], [[RELOAD_LOAD0]]
	; GCN: buffer_store_dword [[VAL]], off, s[0:3], s7 offset:[[VAL_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[VAL]], off, s[0:3], 0 offset:[[VAL_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; VMEM: [[ENDIF]]:			; VMEM: [[ENDIF]]:

	; Reload and restore exec mask			; Reload and restore exec mask
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]



	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:20 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], 0 offset:20 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], s7 offset:24 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], 0 offset:24 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]

	; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}

	; Restore val			; Restore val
	; GCN: buffer_load_dword [[RELOAD_VAL:v[0-9]+]], off, s[0:3], s7 offset:[[VAL_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[RELOAD_VAL:v[0-9]+]], off, s[0:3], 0 offset:[[VAL_OFFSET]] ; 4-byte Folded Reload

	; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RELOAD_VAL]]			; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RELOAD_VAL]]
	define amdgpu_kernel void @divergent_if_endif(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @divergent_if_endif(i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%load0 = load volatile i32, i32 addrspace(3)* undef			%load0 = load volatile i32, i32 addrspace(3)* undef
	%cmp0 = icmp eq i32 %tid, 0			%cmp0 = icmp eq i32 %tid, 0
	br i1 %cmp0, label %if, label %endif			br i1 %cmp0, label %if, label %endif
	Show All 18 Lines
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]

	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0

	; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec			; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec
	; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, [[CMP0]]			; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, [[CMP0]]

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s7 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]


	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], s7 offset:24 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], 0 offset:24 ; 4-byte Folded Spill
	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], s7 offset:28 ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], 0 offset:28 ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}			; GCN: s_mov_b64 exec, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}

	; GCN-NEXT: s_cbranch_execz [[END:BB[0-9]+_[0-9]+]]			; GCN-NEXT: s_cbranch_execz [[END:BB[0-9]+_[0-9]+]]


	; GCN: [[LOOP:BB[0-9]+_[0-9]+]]:			; GCN: [[LOOP:BB[0-9]+_[0-9]+]]:
	; GCN: buffer_load_dword v[[VAL_LOOP_RELOAD:[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[VAL_LOOP_RELOAD:[0-9]+]], off, s[0:3], 0 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: v_subrev_i32_e32 [[VAL_LOOP:v[0-9]+]], vcc, v{{[0-9]+}}, v[[VAL_LOOP_RELOAD]]			; GCN: v_subrev_i32_e32 [[VAL_LOOP:v[0-9]+]], vcc, v{{[0-9]+}}, v[[VAL_LOOP_RELOAD]]
	; GCN: s_cmp_lg_u32			; GCN: s_cmp_lg_u32
	; GCN: buffer_store_dword [[VAL_LOOP]], off, s[0:3], s7 offset:[[VAL_SUB_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[VAL_LOOP]], off, s[0:3], 0 offset:[[VAL_SUB_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN-NEXT: s_cbranch_scc1 [[LOOP]]			; GCN-NEXT: s_cbranch_scc1 [[LOOP]]


	; GCN: [[END]]:			; GCN: [[END]]:
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:24 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], 0 offset:24 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], s7 offset:28 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], 0 offset:28 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]

	; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}
	; GCN: buffer_load_dword v[[VAL_END:[0-9]+]], off, s[0:3], s7 offset:[[VAL_SUB_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[VAL_END:[0-9]+]], off, s[0:3], 0 offset:[[VAL_SUB_OFFSET]] ; 4-byte Folded Reload

	; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v[[VAL_END]]			; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v[[VAL_END]]
	define amdgpu_kernel void @divergent_loop(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @divergent_loop(i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%load0 = load volatile i32, i32 addrspace(3)* undef			%load0 = load volatile i32, i32 addrspace(3)* undef
	%cmp0 = icmp eq i32 %tid, 0			%cmp0 = icmp eq i32 %tid, 0
	br i1 %cmp0, label %loop, label %end			br i1 %cmp0, label %loop, label %end
	Show All 22 Lines
	; GCN: s_mov_b32 [[ZERO:s[0-9]+]], 0			; GCN: s_mov_b32 [[ZERO:s[0-9]+]], 0
	; GCN: v_cmp_ne_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], [[ZERO]], v0			; GCN: v_cmp_ne_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], [[ZERO]], v0

	; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec			; GCN: s_mov_b64 s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, exec
	; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, [[CMP0]]			; GCN: s_and_b64 s{{\[}}[[ANDEXEC_LO:[0-9]+]]:[[ANDEXEC_HI:[0-9]+]]{{\]}}, s{{\[}}[[SAVEEXEC_LO:[0-9]+]]:[[SAVEEXEC_HI:[0-9]+]]{{\]}}, [[CMP0]]
	; GCN: s_xor_b64 s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}, s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}			; GCN: s_xor_b64 s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}, s{{\[}}[[ANDEXEC_LO]]:[[ANDEXEC_HI]]{{\]}}, s{{\[}}[[SAVEEXEC_LO]]:[[SAVEEXEC_HI]]{{\]}}

	; Spill load			; Spill load
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s7 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], 0 offset:[[LOAD0_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR:v[0-9]+]], s[[SAVEEXEC_LO]], [[SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[SAVEEXEC_HI]], [[SAVEEXEC_HI_LANE:[0-9]+]]

	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_LO:[0-9]+]], s[[SAVEEXEC_LO]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], s7 offset:[[SAVEEXEC_LO_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_LO]], off, s[0:3], 0 offset:[[SAVEEXEC_LO_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[V_SAVEEXEC_HI:[0-9]+]], s[[SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], s7 offset:[[SAVEEXEC_HI_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[V_SAVEEXEC_HI]], off, s[0:3], 0 offset:[[SAVEEXEC_HI_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; GCN: s_mov_b64 exec, [[CMP0]]			; GCN: s_mov_b64 exec, [[CMP0]]

	; FIXME: It makes no sense to put this skip here			; FIXME: It makes no sense to put this skip here
	; GCN: s_cbranch_execz [[FLOW:BB[0-9]+_[0-9]+]]			; GCN: s_cbranch_execz [[FLOW:BB[0-9]+_[0-9]+]]
	; GCN-NEXT: s_branch [[ELSE:BB[0-9]+_[0-9]+]]			; GCN-NEXT: s_branch [[ELSE:BB[0-9]+_[0-9]+]]

	; GCN: [[FLOW]]: ; %Flow			; GCN: [[FLOW]]: ; %Flow
	; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]


	; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:[[SAVEEXEC_LO_OFFSET]]			; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], 0 offset:[[SAVEEXEC_LO_OFFSET]]
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC_LO]]			; VMEM: v_readfirstlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC_LO]]

	; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], s7 offset:[[SAVEEXEC_HI_OFFSET]] ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[FLOW_V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], 0 offset:[[SAVEEXEC_HI_OFFSET]] ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC_HI]]			; VMEM: v_readfirstlane_b32 s[[FLOW_S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[FLOW_V_RELOAD_SAVEEXEC_HI]]

	; GCN: s_or_saveexec_b64 s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}, s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_or_saveexec_b64 s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}, s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}

	; Regular spill value restored after exec modification			; Regular spill value restored after exec modification
	; GCN: buffer_load_dword [[FLOW_VAL:v[0-9]+]], off, s[0:3], s7 offset:[[FLOW_VAL_OFFSET:[0-9]+]] ; 4-byte Folded Reload			; GCN: buffer_load_dword [[FLOW_VAL:v[0-9]+]], off, s[0:3], 0 offset:[[FLOW_VAL_OFFSET:[0-9]+]] ; 4-byte Folded Reload


	; Spill saved exec			; Spill saved exec
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[FLOW_S_RELOAD_SAVEEXEC_LO]], [[FLOW_SAVEEXEC_LO_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[FLOW_S_RELOAD_SAVEEXEC_LO]], [[FLOW_SAVEEXEC_LO_LANE:[0-9]+]]
	; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[FLOW_S_RELOAD_SAVEEXEC_HI]], [[FLOW_SAVEEXEC_HI_LANE:[0-9]+]]			; VGPR: v_writelane_b32 [[SPILL_VGPR]], s[[FLOW_S_RELOAD_SAVEEXEC_HI]], [[FLOW_SAVEEXEC_HI_LANE:[0-9]+]]


	; VMEM: v_mov_b32_e32 v[[FLOW_V_SAVEEXEC_LO:[0-9]+]], s[[FLOW_S_RELOAD_SAVEEXEC_LO]]			; VMEM: v_mov_b32_e32 v[[FLOW_V_SAVEEXEC_LO:[0-9]+]], s[[FLOW_S_RELOAD_SAVEEXEC_LO]]
	; VMEM: buffer_store_dword v[[FLOW_V_SAVEEXEC_LO]], off, s[0:3], s7 offset:[[FLOW_SAVEEXEC_LO_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[FLOW_V_SAVEEXEC_LO]], off, s[0:3], 0 offset:[[FLOW_SAVEEXEC_LO_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; VMEM: v_mov_b32_e32 v[[FLOW_V_SAVEEXEC_HI:[0-9]+]], s[[FLOW_S_RELOAD_SAVEEXEC_HI]]			; VMEM: v_mov_b32_e32 v[[FLOW_V_SAVEEXEC_HI:[0-9]+]], s[[FLOW_S_RELOAD_SAVEEXEC_HI]]
	; VMEM: buffer_store_dword v[[FLOW_V_SAVEEXEC_HI]], off, s[0:3], s7 offset:[[FLOW_SAVEEXEC_HI_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; VMEM: buffer_store_dword v[[FLOW_V_SAVEEXEC_HI]], off, s[0:3], 0 offset:[[FLOW_SAVEEXEC_HI_OFFSET:[0-9]+]] ; 4-byte Folded Spill

	; GCN: buffer_store_dword [[FLOW_VAL]], off, s[0:3], s7 offset:[[RESULT_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[FLOW_VAL]], off, s[0:3], 0 offset:[[RESULT_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN: s_xor_b64 exec, exec, s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_xor_b64 exec, exec, s{{\[}}[[FLOW_S_RELOAD_SAVEEXEC_LO]]:[[FLOW_S_RELOAD_SAVEEXEC_HI]]{{\]}}
	; GCN-NEXT: s_cbranch_execz [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN-NEXT: s_cbranch_execz [[ENDIF:BB[0-9]+_[0-9]+]]


	; GCN: ; %bb.{{[0-9]+}}: ; %if			; GCN: ; %bb.{{[0-9]+}}: ; %if
	; GCN: ds_read_b32			; GCN: ds_read_b32
	; GCN: buffer_load_dword v[[LOAD0_RELOAD:[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[LOAD0_RELOAD:[0-9]+]], off, s[0:3], 0 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: v_add_i32_e32 [[ADD:v[0-9]+]], vcc, v{{[0-9]+}}, v[[LOAD0_RELOAD]]			; GCN: v_add_i32_e32 [[ADD:v[0-9]+]], vcc, v{{[0-9]+}}, v[[LOAD0_RELOAD]]
	; GCN: buffer_store_dword [[ADD]], off, s[0:3], s7 offset:[[RESULT_OFFSET]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[ADD]], off, s[0:3], 0 offset:[[RESULT_OFFSET]] ; 4-byte Folded Spill
	; GCN-NEXT: s_branch [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN-NEXT: s_branch [[ENDIF:BB[0-9]+_[0-9]+]]

	; GCN: [[ELSE]]: ; %else			; GCN: [[ELSE]]: ; %else
	; GCN: buffer_load_dword v[[LOAD0_RELOAD:[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[LOAD0_RELOAD:[0-9]+]], off, s[0:3], 0 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: v_subrev_i32_e32 [[SUB:v[0-9]+]], vcc, v{{[0-9]+}}, v[[LOAD0_RELOAD]]			; GCN: v_subrev_i32_e32 [[SUB:v[0-9]+]], vcc, v{{[0-9]+}}, v[[LOAD0_RELOAD]]
	; GCN: buffer_store_dword [[ADD]], off, s[0:3], s7 offset:[[FLOW_RESULT_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[ADD]], off, s[0:3], 0 offset:[[FLOW_RESULT_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN-NEXT: s_branch [[FLOW]]			; GCN-NEXT: s_branch [[FLOW]]

	; GCN: [[ENDIF]]:			; GCN: [[ENDIF]]:
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[FLOW_SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[FLOW_SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[FLOW_SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[FLOW_SAVEEXEC_HI_LANE]]


	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:[[FLOW_SAVEEXEC_LO_OFFSET]] ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], 0 offset:[[FLOW_SAVEEXEC_LO_OFFSET]] ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], v[[V_RELOAD_SAVEEXEC_LO]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], s7 offset:[[FLOW_SAVEEXEC_HI_OFFSET]] ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_HI:[0-9]+]], off, s[0:3], 0 offset:[[FLOW_SAVEEXEC_HI_OFFSET]] ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]			; VMEM: v_readfirstlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], v[[V_RELOAD_SAVEEXEC_HI]]

	; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}			; GCN: s_or_b64 exec, exec, s{{\[}}[[S_RELOAD_SAVEEXEC_LO]]:[[S_RELOAD_SAVEEXEC_HI]]{{\]}}

	; GCN: buffer_load_dword v[[RESULT:[0-9]+]], off, s[0:3], s7 offset:[[RESULT_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[RESULT:[0-9]+]], off, s[0:3], 0 offset:[[RESULT_OFFSET]] ; 4-byte Folded Reload
	; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v[[RESULT]]			; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v[[RESULT]]
	define amdgpu_kernel void @divergent_if_else_endif(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @divergent_if_else_endif(i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%load0 = load volatile i32, i32 addrspace(3)* undef			%load0 = load volatile i32, i32 addrspace(3)* undef
	%cmp0 = icmp eq i32 %tid, 0			%cmp0 = icmp eq i32 %tid, 0
	br i1 %cmp0, label %if, label %else			br i1 %cmp0, label %if, label %else

	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	bb1:
%ins1 = insertvalue { i32, half } %ins0, half %extract1, 1		%ins1 = insertvalue { i32, half } %ins0, half %extract1, 1
ret { i32, half } %ins1		ret { i32, half } %ins1
}		}

define amdgpu_kernel void @v3i16_registers(i1 %cond) #0 {		define amdgpu_kernel void @v3i16_registers(i1 %cond) #0 {
; GCN-LABEL: v3i16_registers:		; GCN-LABEL: v3i16_registers:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_load_dword s4, s[4:5], 0x0		; GCN-NEXT: s_load_dword s4, s[4:5], 0x0
; GCN-NEXT: s_mov_b32 s33, s9		; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_add_u32 s0, s0, s9
		; GCN-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_and_b32 s4, 1, s4		; GCN-NEXT: s_and_b32 s4, 1, s4
; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1		; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1
; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]		; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]
		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_cbranch_vccz BB4_2		; GCN-NEXT: s_cbranch_vccz BB4_2
; GCN-NEXT: ; %bb.1:		; GCN-NEXT: ; %bb.1:
; GCN-NEXT: s_mov_b32 s4, 0		; GCN-NEXT: s_mov_b32 s4, 0
; GCN-NEXT: s_mov_b32 s5, s4		; GCN-NEXT: s_mov_b32 s5, s4
; GCN-NEXT: v_mov_b32_e32 v0, s4		; GCN-NEXT: v_mov_b32_e32 v0, s4
; GCN-NEXT: v_mov_b32_e32 v1, s5		; GCN-NEXT: v_mov_b32_e32 v1, s5
; GCN-NEXT: s_branch BB4_3		; GCN-NEXT: s_branch BB4_3
; GCN-NEXT: BB4_2: ; %if.else		; GCN-NEXT: BB4_2: ; %if.else
Show All 20 Lines	if.end: ; preds = %if.else, %if.then
store <3 x i16> %call6.sink, <3 x i16> addrspace(1)* undef		store <3 x i16> %call6.sink, <3 x i16> addrspace(1)* undef
ret void		ret void
}		}

define amdgpu_kernel void @v3f16_registers(i1 %cond) #0 {		define amdgpu_kernel void @v3f16_registers(i1 %cond) #0 {
; GCN-LABEL: v3f16_registers:		; GCN-LABEL: v3f16_registers:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_load_dword s4, s[4:5], 0x0		; GCN-NEXT: s_load_dword s4, s[4:5], 0x0
; GCN-NEXT: s_mov_b32 s33, s9		; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s33
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_mov_b32 s32, s33		; GCN-NEXT: s_add_u32 s0, s0, s9
		; GCN-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_and_b32 s4, 1, s4		; GCN-NEXT: s_and_b32 s4, 1, s4
; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1		; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1
; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]		; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]
		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_cbranch_vccz BB5_2		; GCN-NEXT: s_cbranch_vccz BB5_2
; GCN-NEXT: ; %bb.1:		; GCN-NEXT: ; %bb.1:
; GCN-NEXT: s_mov_b32 s4, 0		; GCN-NEXT: s_mov_b32 s4, 0
; GCN-NEXT: s_mov_b32 s5, s4		; GCN-NEXT: s_mov_b32 s5, s4
; GCN-NEXT: v_mov_b32_e32 v0, s4		; GCN-NEXT: v_mov_b32_e32 v0, s4
; GCN-NEXT: v_mov_b32_e32 v1, s5		; GCN-NEXT: v_mov_b32_e32 v1, s5
; GCN-NEXT: s_branch BB5_3		; GCN-NEXT: s_branch BB5_3
; GCN-NEXT: BB5_2: ; %if.else		; GCN-NEXT: BB5_2: ; %if.else
Show All 34 Lines

llvm/test/CodeGen/AMDGPU/extload-private.ll

	; RUN: llc -march=amdgcn -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}load_i8_sext_private:			; FUNC-LABEL: {{^}}load_i8_sext_private:
	; SI: buffer_load_sbyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4{{$}}			; SI: buffer_load_sbyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i8_sext_private(i32 addrspace(1)* %out) {			define amdgpu_kernel void @load_i8_sext_private(i32 addrspace(1)* %out) {
	entry:			entry:
	%tmp0 = alloca i8, addrspace(5)			%tmp0 = alloca i8, addrspace(5)
	%tmp1 = load i8, i8 addrspace(5)* %tmp0			%tmp1 = load i8, i8 addrspace(5)* %tmp0
	%tmp2 = sext i8 %tmp1 to i32			%tmp2 = sext i8 %tmp1 to i32
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}load_i8_zext_private:			; FUNC-LABEL: {{^}}load_i8_zext_private:
	; SI: buffer_load_ubyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4{{$}}			; SI: buffer_load_ubyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i8_zext_private(i32 addrspace(1)* %out) {			define amdgpu_kernel void @load_i8_zext_private(i32 addrspace(1)* %out) {
	entry:			entry:
	%tmp0 = alloca i8, addrspace(5)			%tmp0 = alloca i8, addrspace(5)
	%tmp1 = load i8, i8 addrspace(5)* %tmp0			%tmp1 = load i8, i8 addrspace(5)* %tmp0
	%tmp2 = zext i8 %tmp1 to i32			%tmp2 = zext i8 %tmp1 to i32
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}load_i16_sext_private:			; FUNC-LABEL: {{^}}load_i16_sext_private:
	; SI: buffer_load_sshort v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4{{$}}			; SI: buffer_load_sshort v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i16_sext_private(i32 addrspace(1)* %out) {			define amdgpu_kernel void @load_i16_sext_private(i32 addrspace(1)* %out) {
	entry:			entry:
	%tmp0 = alloca i16, addrspace(5)			%tmp0 = alloca i16, addrspace(5)
	%tmp1 = load i16, i16 addrspace(5)* %tmp0			%tmp1 = load i16, i16 addrspace(5)* %tmp0
	%tmp2 = sext i16 %tmp1 to i32			%tmp2 = sext i16 %tmp1 to i32
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}load_i16_zext_private:			; FUNC-LABEL: {{^}}load_i16_zext_private:
	; SI: buffer_load_ushort v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4{{$}}			; SI: buffer_load_ushort v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i16_zext_private(i32 addrspace(1)* %out) {			define amdgpu_kernel void @load_i16_zext_private(i32 addrspace(1)* %out) {
	entry:			entry:
	%tmp0 = alloca i16, addrspace(5)			%tmp0 = alloca i16, addrspace(5)
	%tmp1 = load volatile i16, i16 addrspace(5)* %tmp0			%tmp1 = load volatile i16, i16 addrspace(5)* %tmp0
	%tmp2 = zext i16 %tmp1 to i32			%tmp2 = zext i16 %tmp1 to i32
	store i32 %tmp2, i32 addrspace(1)* %out			store i32 %tmp2, i32 addrspace(1)* %out
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=-unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-ALIGNED %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=-unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-ALIGNED %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-UNALIGNED %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-UNALIGNED %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX9 %s

	; Should not merge this to a dword load			; Should not merge this to a dword load
	define i32 @private_load_2xi16_align2(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align2(i16 addrspace(5)* %p) #0 {
	; GFX7-ALIGNED-LABEL: private_load_2xi16_align2:			; GFX7-ALIGNED-LABEL: private_load_2xi16_align2:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_load_ushort v1, v1, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_ushort v1, v1, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v0, v1			; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v0, v1
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_load_2xi16_align2:			; GFX7-UNALIGNED-LABEL: private_load_2xi16_align2:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0			; GFX7-UNALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0
	; GFX7-UNALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: buffer_load_ushort v1, v1, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_load_ushort v1, v1, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; GFX7-UNALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX7-UNALIGNED-NEXT: v_or_b32_e32 v0, v0, v1			; GFX7-UNALIGNED-NEXT: v_or_b32_e32 v0, v0, v1
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_load_2xi16_align2:			; GFX9-LABEL: private_load_2xi16_align2:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_ushort v1, v0, s[0:3], s33 offen			; GFX9-NEXT: buffer_load_ushort v1, v0, s[0:3], 0 offen
	; GFX9-NEXT: buffer_load_ushort v0, v0, s[0:3], s33 offen offset:2			; GFX9-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshl_or_b32 v0, v0, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v0, v0, 16, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 2			%p.0 = load i16, i16 addrspace(5)* %p, align 2
	%p.1 = load i16, i16 addrspace(5)* %gep.p, align 2			%p.1 = load i16, i16 addrspace(5)* %gep.p, align 2
	%zext.0 = zext i16 %p.0 to i32			%zext.0 = zext i16 %p.0 to i32
	%zext.1 = zext i16 %p.1 to i32			%zext.1 = zext i16 %p.1 to i32
	%shl.1 = shl i32 %zext.1, 16			%shl.1 = shl i32 %zext.1, 16
	%or = or i32 %zext.0, %shl.1			%or = or i32 %zext.0, %shl.1
	ret i32 %or			ret i32 %or
	}			}

	; Should not merge this to a dword store			; Should not merge this to a dword store
	define void @private_store_2xi16_align2(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {			define void @private_store_2xi16_align2(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {
	; GFX7-ALIGNED-LABEL: private_store_2xi16_align2:			; GFX7-ALIGNED-LABEL: private_store_2xi16_align2:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v3, 1			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v3, 1
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 2			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 2
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1
	; GFX7-ALIGNED-NEXT: buffer_store_short v3, v1, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_store_short v3, v1, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_store_short v0, v2, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_store_2xi16_align2:			; GFX7-UNALIGNED-LABEL: private_store_2xi16_align2:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v3, 1			; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v3, 1
	; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 2			; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 2
	; GFX7-UNALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1			; GFX7-UNALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1
	; GFX7-UNALIGNED-NEXT: buffer_store_short v3, v1, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_store_short v3, v1, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: buffer_store_short v0, v2, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_store_2xi16_align2:			; GFX9-LABEL: private_store_2xi16_align2:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], s33 offen			; GFX9-NEXT: v_mov_b32_e32 v2, 2
	; GFX9-NEXT: v_mov_b32_e32 v0, 2			; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], s33 offen offset:2			; GFX9-NEXT: buffer_store_short v2, v1, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 2			store i16 1, i16 addrspace(5)* %r, align 2
	store i16 2, i16 addrspace(5)* %gep.r, align 2			store i16 2, i16 addrspace(5)* %gep.r, align 2
	ret void			ret void
	}			}

	; Should produce align 1 dword when legal			; Should produce align 1 dword when legal
	define i32 @private_load_2xi16_align1(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align1(i16 addrspace(5)* %p) #0 {
	; GFX7-ALIGNED-LABEL: private_load_2xi16_align1:			; GFX7-ALIGNED-LABEL: private_load_2xi16_align1:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 1, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ubyte v2, v2, s[0:3], s33 offen
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v3, vcc, 3, v0			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 1, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ubyte v3, v3, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_ubyte v3, v0, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_load_ubyte v1, v1, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v0, vcc, 3, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ubyte v0, v0, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_ubyte v0, v0, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(3)			; GFX7-ALIGNED-NEXT: buffer_load_ubyte v2, v2, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v2, 8, v2			; GFX7-ALIGNED-NEXT: buffer_load_ubyte v1, v1, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(2)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(2)
	; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v3, 8, v3			; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v0, 8, v0
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(1)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(1)
	; GFX7-ALIGNED-NEXT: v_or_b32_e32 v1, v3, v1			; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v2, 8, v2
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v2, v0
	; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v0, v1			; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v0, v1
				; GFX7-ALIGNED-NEXT: v_or_b32_e32 v2, v2, v3
				; GFX7-ALIGNED-NEXT: v_lshlrev_b32_e32 v0, 16, v0
				; GFX7-ALIGNED-NEXT: v_or_b32_e32 v0, v2, v0
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_load_2xi16_align1:			; GFX7-UNALIGNED-LABEL: private_load_2xi16_align1:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_load_2xi16_align1:			; GFX9-LABEL: private_load_2xi16_align1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff
	; GFX9-NEXT: s_mov_b32 s4, 0xffff			; GFX9-NEXT: s_mov_b32 s4, 0xffff
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0			; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0
	; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 1			%p.0 = load i16, i16 addrspace(5)* %p, align 1
	%p.1 = load i16, i16 addrspace(5)* %gep.p, align 1			%p.1 = load i16, i16 addrspace(5)* %gep.p, align 1
	%zext.0 = zext i16 %p.0 to i32			%zext.0 = zext i16 %p.0 to i32
	%zext.1 = zext i16 %p.1 to i32			%zext.1 = zext i16 %p.1 to i32
	%shl.1 = shl i32 %zext.1, 16			%shl.1 = shl i32 %zext.1, 16
	%or = or i32 %zext.0, %shl.1			%or = or i32 %zext.0, %shl.1
	ret i32 %or			ret i32 %or
	}			}

	; Should produce align 1 dword when legal			; Should produce align 1 dword when legal
	define void @private_store_2xi16_align1(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {			define void @private_store_2xi16_align1(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {
	; GFX7-ALIGNED-LABEL: private_store_2xi16_align1:			; GFX7-ALIGNED-LABEL: private_store_2xi16_align1:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v3, 1			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v3, 1
	; GFX7-ALIGNED-NEXT: buffer_store_byte v3, v1, s[0:3], s33 offen
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v1
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v3, vcc, 1, v1			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v4, vcc, 1, v1
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v4, 0			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v5, 0
				; GFX7-ALIGNED-NEXT: buffer_store_byte v3, v1, s[0:3], 0 offen
				; GFX7-ALIGNED-NEXT: buffer_store_byte v5, v4, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 3, v1			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 3, v1
	; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 2			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 2
	; GFX7-ALIGNED-NEXT: buffer_store_byte v4, v3, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_store_byte v5, v1, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_store_byte v4, v1, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_store_byte v0, v2, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: buffer_store_byte v0, v2, s[0:3], s33 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_store_2xi16_align1:			; GFX7-UNALIGNED-LABEL: private_store_2xi16_align1:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX7-UNALIGNED-NEXT: buffer_store_dword v0, v1, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_store_2xi16_align1:			; GFX9-LABEL: private_store_2xi16_align1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], s33 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 1			store i16 1, i16 addrspace(5)* %r, align 1
	store i16 2, i16 addrspace(5)* %gep.r, align 1			store i16 2, i16 addrspace(5)* %gep.r, align 1
	ret void			ret void
	}			}

	; Should merge this to a dword load			; Should merge this to a dword load
	define i32 @private_load_2xi16_align4(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align4(i16 addrspace(5)* %p) #0 {
	; GFX7-LABEL: load_2xi16_align4:			; GFX7-LABEL: load_2xi16_align4:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: flat_load_dword v0, v[0:1]			; GFX7-NEXT: flat_load_dword v0, v[0:1]
	; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-ALIGNED-LABEL: private_load_2xi16_align4:			; GFX7-ALIGNED-LABEL: private_load_2xi16_align4:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX7-ALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-UNALIGNED-LABEL: private_load_2xi16_align4:			; GFX7-UNALIGNED-LABEL: private_load_2xi16_align4:
	; GFX7-UNALIGNED: ; %bb.0:			; GFX7-UNALIGNED: ; %bb.0:
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-UNALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX7-UNALIGNED-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)			; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]			; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: private_load_2xi16_align4:			; GFX9-LABEL: private_load_2xi16_align4:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen			; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff
	; GFX9-NEXT: s_mov_b32 s4, 0xffff			; GFX9-NEXT: s_mov_b32 s4, 0xffff
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0			; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0
	; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 4			%p.0 = load i16, i16 addrspace(5)* %p, align 4
	Show All 16 Lines
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: flat_store_dword v[0:1], v2			; GFX7-NEXT: flat_store_dword v[0:1], v2
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GCN-LABEL: private_store_2xi16_align4:			; GCN-LABEL: private_store_2xi16_align4:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, 0x20001			; GCN-NEXT: v_mov_b32_e32 v0, 0x20001
	; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], s33 offen			; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 4			store i16 1, i16 addrspace(5)* %r, align 4
	store i16 2, i16 addrspace(5)* %gep.r, align 2			store i16 2, i16 addrspace(5)* %gep.r, align 2
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/fold-fi-mubuf.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass si-fold-operands,dead-mi-elimination %s -o - \| FileCheck -check-prefix=GCN %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -run-pass si-fold-operands,dead-mi-elimination %s -o - \| FileCheck -check-prefix=GCN %s

				# Kernels can have no FP
	---			---
	name: no_fold_fi_non_stack_rsrc_soffset			name: kernel_no_fold_fi_non_stack_rsrc_and_soffset
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	localFrameSize: 4			localFrameSize: 4
	stack:			stack:
	- { id: 0, size: 4, alignment: 4, local-offset: 0 }			- { id: 0, size: 4, alignment: 4, local-offset: 0 }
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr6'			stackPtrOffsetReg: '$sgpr32'
	frameOffsetReg: '$sgpr6'
	stackPtrOffsetReg: '$sgpr6'
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr12_sgpr13_sgpr14_sgpr15			liveins: $sgpr12_sgpr13_sgpr14_sgpr15

	; GCN-LABEL: name: no_fold_fi_non_stack_rsrc_soffset			; GCN-LABEL: name: kernel_no_fold_fi_non_stack_rsrc_and_soffset
	; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15			; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
	; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15			; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]			; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
	; GCN: SI_RETURN_TO_EPILOG $vgpr0			; GCN: SI_RETURN_TO_EPILOG $vgpr0
	%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15			%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
	%1:sreg_32_xm0 = S_MOV_B32 0			%1:sreg_32_xm0 = S_MOV_B32 0
	%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, %1, 0, 0, 0, 0, 0, 0, implicit $exec			%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, %1, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr0 = COPY %3			$vgpr0 = COPY %3
	SI_RETURN_TO_EPILOG $vgpr0			SI_RETURN_TO_EPILOG $vgpr0

	...			...

	---			---
	name: no_fold_fi_non_stack_rsrc			name: kernel_no_fold_fi_non_stack_rsrc
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	localFrameSize: 4			localFrameSize: 4
	stack:			stack:
	- { id: 0, size: 4, alignment: 4, local-offset: 0 }			- { id: 0, size: 4, alignment: 4, local-offset: 0 }
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr6'
	frameOffsetReg: '$sgpr6'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr12_sgpr13_sgpr14_sgpr15			liveins: $sgpr12_sgpr13_sgpr14_sgpr15

	; GCN-LABEL: name: no_fold_fi_non_stack_rsrc			; GCN-LABEL: name: kernel_no_fold_fi_non_stack_rsrc
	; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15			; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
	; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15			; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]			; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
	; GCN: SI_RETURN_TO_EPILOG $vgpr0			; GCN: SI_RETURN_TO_EPILOG $vgpr0
	%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15			%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
	%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr0 = COPY %3			$vgpr0 = COPY %3
	SI_RETURN_TO_EPILOG $vgpr0			SI_RETURN_TO_EPILOG $vgpr0

	...			...

	# Offset is from global scratch wave offset.
	---			---
	name: fold_fi_mubuf_scratch_scratch_wave_offset			name: kernel_no_fold_fi_non_stack_soffset
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	localFrameSize: 4			localFrameSize: 4
	stack:			stack:
	- { id: 0, size: 4, alignment: 4, local-offset: 0 }			- { id: 0, size: 4, alignment: 4, local-offset: 0 }
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr33'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:

	; GCN-LABEL: name: fold_fi_mubuf_scratch_scratch_wave_offset			; GCN-LABEL: name: kernel_no_fold_fi_non_stack_soffset
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				; GCN: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
				; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_1]], [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN [[V_MOV_B32_e32_]], $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
				; GCN: S_ENDPGM 0, implicit $vgpr0
				%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
				%2:sreg_32_xm0 = S_MOV_B32 0

				BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, %2, 0, 0, 0, 0, 0, 0, implicit $exec
				%3:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, %2, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %3
				S_ENDPGM 0, implicit $vgpr0

				...

				---
				name: kernel_fold_fi_mubuf
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: true
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				stackPtrOffsetReg: '$sgpr32'
				body: \|
				bb.0:

				; GCN-LABEL: name: kernel_fold_fi_mubuf
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
	; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]			; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
	; GCN: S_ENDPGM 0, implicit $vgpr0			; GCN: S_ENDPGM 0, implicit $vgpr0
	%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec			%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

	BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, 0, 0, implicit $exec			BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, 0, 0, implicit $exec			%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr0 = COPY %2			$vgpr0 = COPY %2
	S_ENDPGM 0, implicit $vgpr0			S_ENDPGM 0, implicit $vgpr0

	...			...


				# Functions have an unswizzled SP/FP relative to the wave offset
	---			---
	name: no_fold_fi_mubuf_scratch_sp_offset			name: function_no_fold_fi_non_stack_rsrc_and_soffset
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	localFrameSize: 4			localFrameSize: 4
	stack:			stack:
	- { id: 0, size: 4, alignment: 4, local-offset: 0 }			- { id: 0, size: 4, alignment: 4, local-offset: 0 }
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: false
				scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
				frameOffsetReg: '$sgpr32'
				stackPtrOffsetReg: '$sgpr32'
				body: \|
				bb.0:
				liveins: $sgpr12_sgpr13_sgpr14_sgpr15

				; GCN-LABEL: name: function_no_fold_fi_non_stack_rsrc_and_soffset
				; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
				; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
				; GCN: SI_RETURN_TO_EPILOG $vgpr0
				%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
				%1:sreg_32_xm0 = S_MOV_B32 0
				%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, %1, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %3
				SI_RETURN_TO_EPILOG $vgpr0

				...

				---
				name: function_no_fold_fi_non_stack_rsrc
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
				frameOffsetReg: '$sgpr32'
				stackPtrOffsetReg: '$sgpr32'
				body: \|
				bb.0:
				liveins: $sgpr12_sgpr13_sgpr14_sgpr15

				; GCN-LABEL: name: function_no_fold_fi_non_stack_rsrc
				; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
				; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
				; GCN: SI_RETURN_TO_EPILOG $vgpr0
				%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
				%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %3
				SI_RETURN_TO_EPILOG $vgpr0

				...

				---
				name: function_no_fold_fi_non_stack_soffset
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				frameOffsetReg: '$sgpr32'
				stackPtrOffsetReg: '$sgpr32'
				body: \|
				bb.0:

				; GCN-LABEL: name: function_no_fold_fi_non_stack_soffset
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
				; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
				; GCN: S_ENDPGM 0, implicit $vgpr0
				%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

				BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %2
				S_ENDPGM 0, implicit $vgpr0

				...

				---
				name: function_fold_fi_mubuf_wave_relative
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: false
				scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
				frameOffsetReg: '$sgpr32'
				stackPtrOffsetReg: '$sgpr32'
				body: \|
				bb.0:

				; GCN-LABEL: name: function_fold_fi_mubuf_wave_relative
				; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
				; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
				; GCN: S_ENDPGM 0, implicit $vgpr0
				%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
				%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

				BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr0 = COPY %2
				S_ENDPGM 0, implicit $vgpr0

				...

				---
				name: function_fold_fi_mubuf_stack_relative
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 4
				localFrameSize: 4
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: 0 }
				machineFunctionInfo:
				isEntryFunction: false
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr33'			frameOffsetReg: '$sgpr32'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:

	; GCN-LABEL: name: no_fold_fi_mubuf_scratch_sp_offset			; GCN-LABEL: name: function_fold_fi_mubuf_stack_relative
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
	; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]			; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
	; GCN: S_ENDPGM 0, implicit $vgpr0			; GCN: S_ENDPGM 0, implicit $vgpr0
	%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec			%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
	%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec			%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

	BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec			%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec
	$vgpr0 = COPY %2			$vgpr0 = COPY %2
	S_ENDPGM 0, implicit $vgpr0			S_ENDPGM 0, implicit $vgpr0

	...			...

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

; Test that non-entry function frame indices are expanded properly to		; Test that non-entry function frame indices are expanded properly to
; give an index relative to the scratch wave offset register		; give an index relative to the scratch wave offset register

; Materialize into a mov. Make sure there isn't an unnecessary copy.		; Materialize into a mov. Make sure there isn't an unnecessary copy.
; GCN-LABEL: {{^}}func_mov_fi_i32:		; GCN-LABEL: {{^}}func_mov_fi_i32:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN: s_sub_u32 [[SUB:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI-NEXT: v_lshr_b32_e64 v0, [[SUB]], 6		; CI-NEXT: v_lshr_b32_e64 v0, s32, 6
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, [[SUB]]		; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s32

; GCN-NOT: v_mov		; GCN-NOT: v_mov
; GCN: ds_write_b32 v0, v0		; GCN: ds_write_b32 v0, v0
define void @func_mov_fi_i32() #0 {		define void @func_mov_fi_i32() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 addrspace(5)* %alloca, i32 addrspace(5)* addrspace(3)* undef		store volatile i32 addrspace(5)* %alloca, i32 addrspace(5)* addrspace(3)* undef
ret void		ret void
}		}

; Offset due to different objects		; Offset due to different objects
; GCN-LABEL: {{^}}func_mov_fi_i32_offset:		; GCN-LABEL: {{^}}func_mov_fi_i32_offset:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

; CI: s_sub_u32 [[SUB0:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33		; CI-DAG: v_lshr_b32_e64 v0, s32, 6
; CI-NEXT: s_sub_u32 [[SUB1:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33		; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
; CI-DAG: v_lshr_b32_e64 v0, [[SUB0]], 6
; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], [[SUB1]], 6
; CI-NOT: v_mov		; CI-NOT: v_mov
; CI: ds_write_b32 v0, v0		; CI: ds_write_b32 v0, v0
; CI-NEXT: v_add_i32_e64 v0, s{{\[[0-9]+:[0-9]+\]}}, 4, [[SCALED]]		; CI-NEXT: v_add_i32_e{{32\|64}} v0, {{s\[[0-9]+:[0-9]+\]\|vcc}}, 4, [[SCALED]]
; CI-NEXT: ds_write_b32 v0, v0		; CI-NEXT: ds_write_b32 v0, v0

; GFX9: s_sub_u32 [[SUB0:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33		; GFX9: v_lshrrev_b32_e64 v0, 6, s32
; GFX9-NEXT: s_sub_u32 [[SUB1:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33		; GFX9-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, [[SUB0]]
; GFX9-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, [[SUB1]]
; GFX9-DAG: ds_write_b32 v0, v0		; GFX9-DAG: ds_write_b32 v0, v0
; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]		; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]
; GFX9-NEXT: ds_write_b32 v0, v0		; GFX9-NEXT: ds_write_b32 v0, v0
define void @func_mov_fi_i32_offset() #0 {		define void @func_mov_fi_i32_offset() #0 {
%alloca0 = alloca i32, addrspace(5)		%alloca0 = alloca i32, addrspace(5)
%alloca1 = alloca i32, addrspace(5)		%alloca1 = alloca i32, addrspace(5)
store volatile i32 addrspace(5)* %alloca0, i32 addrspace(5)* addrspace(3)* undef		store volatile i32 addrspace(5)* %alloca0, i32 addrspace(5)* addrspace(3)* undef
store volatile i32 addrspace(5)* %alloca1, i32 addrspace(5)* addrspace(3)* undef		store volatile i32 addrspace(5)* %alloca1, i32 addrspace(5)* addrspace(3)* undef
ret void		ret void
}		}

; Materialize into an add of a constant offset from the FI.		; Materialize into an add of a constant offset from the FI.
; FIXME: Should be able to merge adds		; FIXME: Should be able to merge adds

; GCN-LABEL: {{^}}func_add_constant_to_fi_i32:		; GCN-LABEL: {{^}}func_add_constant_to_fi_i32:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN: s_sub_u32 [[SUB:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI-NEXT: v_lshr_b32_e64 [[SCALED:v[0-9]+]], [[SUB]], 6		; CI: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
; CI-NEXT: v_add_i32_e32 v0, vcc, 4, [[SCALED]]		; CI-NEXT: v_add_i32_e32 v0, vcc, 4, [[SCALED]]

; GFX9-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, [[SUB]]		; GFX9: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]		; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]


; GCN-NOT: v_mov		; GCN-NOT: v_mov
; GCN: ds_write_b32 v0, v0		; GCN: ds_write_b32 v0, v0
define void @func_add_constant_to_fi_i32() #0 {		define void @func_add_constant_to_fi_i32() #0 {
%alloca = alloca [2 x i32], align 4, addrspace(5)		%alloca = alloca [2 x i32], align 4, addrspace(5)
%gep0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(5)* %alloca, i32 0, i32 1		%gep0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(5)* %alloca, i32 0, i32 1
store volatile i32 addrspace(5)* %gep0, i32 addrspace(5)* addrspace(3)* undef		store volatile i32 addrspace(5)* %gep0, i32 addrspace(5)* addrspace(3)* undef
ret void		ret void
}		}

; A user the materialized frame index can't be meaningfully folded		; A user the materialized frame index can't be meaningfully folded
; into.		; into.

; GCN-LABEL: {{^}}func_other_fi_user_i32:		; GCN-LABEL: {{^}}func_other_fi_user_i32:
; GCN: s_sub_u32 [[SUB:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI-NEXT: v_lshr_b32_e64 v0, [[SUB]], 6		; CI: v_lshr_b32_e64 v0, s32, 6

; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, [[SUB]]		; GFX9: v_lshrrev_b32_e64 v0, 6, s32

; GCN-NEXT: v_mul_u32_u24_e32 v0, 9, v0		; GCN-NEXT: v_mul_u32_u24_e32 v0, 9, v0
; GCN-NOT: v_mov		; GCN-NOT: v_mov
; GCN: ds_write_b32 v0, v0		; GCN: ds_write_b32 v0, v0
define void @func_other_fi_user_i32() #0 {		define void @func_other_fi_user_i32() #0 {
%alloca = alloca [2 x i32], align 4, addrspace(5)		%alloca = alloca [2 x i32], align 4, addrspace(5)
%ptrtoint = ptrtoint [2 x i32] addrspace(5)* %alloca to i32		%ptrtoint = ptrtoint [2 x i32] addrspace(5)* %alloca to i32
%mul = mul i32 %ptrtoint, 9		%mul = mul i32 %ptrtoint, 9
store volatile i32 %mul, i32 addrspace(3)* undef		store volatile i32 %mul, i32 addrspace(3)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_store_private_arg_i32_ptr:		; GCN-LABEL: {{^}}func_store_private_arg_i32_ptr:
; GCN: v_mov_b32_e32 v1, 15{{$}}		; GCN: v_mov_b32_e32 v1, 15{{$}}
; GCN: buffer_store_dword v1, v0, s[0:3], s33 offen{{$}}		; GCN: buffer_store_dword v1, v0, s[0:3], 0 offen{{$}}
define void @func_store_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {		define void @func_store_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {
store volatile i32 15, i32 addrspace(5)* %ptr		store volatile i32 15, i32 addrspace(5)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_load_private_arg_i32_ptr:		; GCN-LABEL: {{^}}func_load_private_arg_i32_ptr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen{{$}}		; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen{{$}}
define void @func_load_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {		define void @func_load_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {
%val = load volatile i32, i32 addrspace(5)* %ptr		%val = load volatile i32, i32 addrspace(5)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr:		; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_sub_u32 [[SUB_OFFSET:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI-NEXT: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], [[SUB_OFFSET]], 6		; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6
; CI-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]		; CI-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]

; GFX9-NEXT: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, [[SUB_OFFSET]]		; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32
; GFX9-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]		; GFX9-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]

; GCN-NOT: v_mov		; GCN-NOT: v_mov
; GCN: ds_write_b32 v0, v0		; GCN: ds_write_b32 v0, v0
define void @void_func_byval_struct_i8_i32_ptr({ i8, i32 } addrspace(5)* byval %arg0) #0 {		define void @void_func_byval_struct_i8_i32_ptr({ i8, i32 } addrspace(5)* byval %arg0) #0 {
%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0		%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0
%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1		%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1
%load1 = load i32, i32 addrspace(5)* %gep1		%load1 = load i32, i32 addrspace(5)* %gep1
Show All 11 Lines	define void @void_func_byval_struct_i8_i32_ptr_value({ i8, i32 } addrspace(5)* byval %arg0) #0 {
%load0 = load i8, i8 addrspace(5)* %gep0		%load0 = load i8, i8 addrspace(5)* %gep0
%load1 = load i32, i32 addrspace(5)* %gep1		%load1 = load i32, i32 addrspace(5)* %gep1
store volatile i8 %load0, i8 addrspace(3)* undef		store volatile i8 %load0, i8 addrspace(3)* undef
store volatile i32 %load1, i32 addrspace(3)* undef		store volatile i32 %load1, i32 addrspace(3)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_nonentry_block:		; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_nonentry_block:
; GCN: s_sub_u32 [[SUB_OFFSET:s[0-9]+]], s32, s33

; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], [[SUB_OFFSET]], 6		; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6

; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, [[SUB_OFFSET]]		; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32

; GCN: s_and_saveexec_b64		; GCN: s_and_saveexec_b64

; CI: v_add_i32_e32 [[GEP:v[0-9]+]], vcc, 4, [[SHIFT]]		; CI: v_add_i32_e32 [[GEP:v[0-9]+]], vcc, 4, [[SHIFT]]
; CI: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}		; CI: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}

; GFX9: v_add_u32_e32 [[GEP:v[0-9]+]], 4, [[SHIFT]]		; GFX9: v_add_u32_e32 [[GEP:v[0-9]+]], 4, [[SHIFT]]
; GFX9: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}		; GFX9: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}
Show All 11 Lines	bb:
br label %ret		br label %ret

ret:		ret:
ret void		ret void
}		}

; Added offset can't be used with VOP3 add		; Added offset can't be used with VOP3 add
; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32:		; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32:
; GCN: s_sub_u32 [[SUB:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33
; CI-DAG: s_movk_i32 [[K:s[0-9]+\|vcc_lo\|vcc_hi]], 0x200

; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], [[SUB]], 6		; CI-DAG: s_movk_i32 [[K:s[0-9]+\|vcc_lo\|vcc_hi]], 0x200
		; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
; CI: v_add_i32_e32 [[VZ:v[0-9]+]], vcc, [[K]], [[SCALED]]		; CI: v_add_i32_e32 [[VZ:v[0-9]+]], vcc, [[K]], [[SCALED]]

; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, [[SUB]]		; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]		; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]

; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]		; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]
; GCN: ds_write_b32 v0, [[VZ]]		; GCN: ds_write_b32 v0, [[VZ]]
define void @func_other_fi_user_non_inline_imm_offset_i32() #0 {		define void @func_other_fi_user_non_inline_imm_offset_i32() #0 {
%alloca0 = alloca [128 x i32], align 4, addrspace(5)		%alloca0 = alloca [128 x i32], align 4, addrspace(5)
%alloca1 = alloca [8 x i32], align 4, addrspace(5)		%alloca1 = alloca [8 x i32], align 4, addrspace(5)
%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65		%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65
%gep1 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca1, i32 0, i32 0		%gep1 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca1, i32 0, i32 0
store volatile i32 7, i32 addrspace(5)* %gep0		store volatile i32 7, i32 addrspace(5)* %gep0
%ptrtoint = ptrtoint i32 addrspace(5)* %gep1 to i32		%ptrtoint = ptrtoint i32 addrspace(5)* %gep1 to i32
%mul = mul i32 %ptrtoint, 9		%mul = mul i32 %ptrtoint, 9
store volatile i32 %mul, i32 addrspace(3)* undef		store volatile i32 %mul, i32 addrspace(3)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32_vcc_live:		; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32_vcc_live:
; GCN: s_sub_u32 [[DIFF:s[0-9]+]], s32, s33
; CI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x200

; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], [[DIFF]], 6		; CI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x200
		; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
; CI: v_add_i32_e64 [[VZ:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, [[OFFSET]], [[SCALED]]		; CI: v_add_i32_e64 [[VZ:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, [[OFFSET]], [[SCALED]]

; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, [[DIFF]]		; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]		; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]

; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]		; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]
; GCN: ds_write_b32 v0, [[VZ]]		; GCN: ds_write_b32 v0, [[VZ]]
define void @func_other_fi_user_non_inline_imm_offset_i32_vcc_live() #0 {		define void @func_other_fi_user_non_inline_imm_offset_i32_vcc_live() #0 {
%alloca0 = alloca [128 x i32], align 4, addrspace(5)		%alloca0 = alloca [128 x i32], align 4, addrspace(5)
%alloca1 = alloca [8 x i32], align 4, addrspace(5)		%alloca1 = alloca [8 x i32], align 4, addrspace(5)
%vcc = call i64 asm sideeffect "; def $0", "={vcc}"()		%vcc = call i64 asm sideeffect "; def $0", "={vcc}"()
Show All 34 Lines

bb5:		bb5:
ret void		ret void
}		}

; GCN-LABEL: {{^}}alloca_ptr_nonentry_block:		; GCN-LABEL: {{^}}alloca_ptr_nonentry_block:
; GCN: s_and_saveexec_b64		; GCN: s_and_saveexec_b64
; GCN: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4		; GCN: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4
; GCN: s_sub_u32 [[SUB_OFFSET:s[0-9]+\|vcc_lo\|vcc_hi]], s32, s33

; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], [[SUB_OFFSET]], 6		; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6
; CI-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]		; CI-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]

; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, [[SUB_OFFSET]]		; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32
; GFX9-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]		; GFX9-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]

; GCN: ds_write_b32 v{{[0-9]+}}, [[PTR]]		; GCN: ds_write_b32 v{{[0-9]+}}, [[PTR]]
define void @alloca_ptr_nonentry_block(i32 %arg0) #0 {		define void @alloca_ptr_nonentry_block(i32 %arg0) #0 {
%alloca0 = alloca { i8, i32 }, align 4, addrspace(5)		%alloca0 = alloca { i8, i32 }, align 4, addrspace(5)
%cmp = icmp eq i32 %arg0, 0		%cmp = icmp eq i32 %arg0, 0
br i1 %cmp, label %bb, label %ret		br i1 %cmp, label %bb, label %ret

Show All 12 Lines

llvm/test/CodeGen/AMDGPU/frame-lowering-entry-all-sgpr-used.mir

Show All 20 Lines	liveins:
- { reg: '$sgpr9' }		- { reg: '$sgpr9' }
machineFunctionInfo:		machineFunctionInfo:
explicitKernArgSize: 84		explicitKernArgSize: 84
maxKernArgAlign: 8		maxKernArgAlign: 8
ldsSize: 20496		ldsSize: 20496
isEntryFunction: true		isEntryFunction: true
waveLimiter: true		waveLimiter: true
scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'		scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
scratchWaveOffsetReg: '$sgpr101'
frameOffsetReg: '$sgpr101'		frameOffsetReg: '$sgpr101'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
argumentInfo:		argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr4_sgpr5' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		workGroupIDX: { reg: '$sgpr8' }
workGroupIDY: { reg: '$sgpr9' }		workGroupIDY: { reg: '$sgpr9' }
Show All 17 Lines

llvm/test/CodeGen/AMDGPU/frame-lowering-fp-adjusted.mir

	Show All 23 Lines
	stack:			stack:
	- { id: 0, type: spill-slot, size: 4, alignment: 4 }			- { id: 0, type: spill-slot, size: 4, alignment: 4 }
	machineFunctionInfo:			machineFunctionInfo:
	explicitKernArgSize: 660			explicitKernArgSize: 660
	maxKernArgAlign: 4			maxKernArgAlign: 4
	isEntryFunction: true			isEntryFunction: true
	waveLimiter: true			waveLimiter: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr101'
	frameOffsetReg: '$sgpr101'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
				frameOffsetReg: '$sgpr34'
	argumentInfo:			argumentInfo:
	privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	dispatchPtr: { reg: '$sgpr4_sgpr5' }			dispatchPtr: { reg: '$sgpr4_sgpr5' }
	kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }			kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
	workGroupIDX: { reg: '$sgpr8' }			workGroupIDX: { reg: '$sgpr8' }
	privateSegmentWaveByteOffset: { reg: '$sgpr9' }			privateSegmentWaveByteOffset: { reg: '$sgpr9' }
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1
	liveins: $sgpr8, $vgpr0, $sgpr4_sgpr5, $sgpr6_sgpr7			liveins: $sgpr8, $vgpr0, $sgpr4_sgpr5, $sgpr6_sgpr7

	bb.1:			bb.1:
	liveins: $sgpr4, $sgpr5, $sgpr9, $sgpr22, $vgpr0, $sgpr6_sgpr7			liveins: $sgpr4, $sgpr5, $sgpr9, $sgpr22, $vgpr0, $sgpr6_sgpr7

	renamable $vgpr2 = IMPLICIT_DEF			renamable $vgpr2 = IMPLICIT_DEF
	SI_SPILL_V32_SAVE killed $vgpr2, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			SI_SPILL_V32_SAVE killed $vgpr2, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)

llvm/test/CodeGen/AMDGPU/function-returns.ll

	Show First 20 Lines • Show All 451 Lines • ▼ Show 20 Lines
	define {i8, i32} @struct_i8_i32_func_void() #0 {			define {i8, i32} @struct_i8_i32_func_void() #0 {
	%val = load { i8, i32 }, { i8, i32 } addrspace(1)* undef			%val = load { i8, i32 }, { i8, i32 } addrspace(1)* undef
	ret { i8, i32 } %val			ret { i8, i32 } %val
	}			}

	; GCN-LABEL: {{^}}void_func_sret_struct_i8_i32:			; GCN-LABEL: {{^}}void_func_sret_struct_i8_i32:
	; GCN: buffer_load_ubyte [[VAL0:v[0-9]+]]			; GCN: buffer_load_ubyte [[VAL0:v[0-9]+]]
	; GCN: buffer_load_dword [[VAL1:v[0-9]+]]			; GCN: buffer_load_dword [[VAL1:v[0-9]+]]
	; GCN: buffer_store_byte [[VAL0]], v0, s[0:3], s33 offen{{$}}			; GCN: buffer_store_byte [[VAL0]], v0, s[0:3], 0 offen{{$}}
	; GCN: buffer_store_dword [[VAL1]], v0, s[0:3], s33 offen offset:4{{$}}			; GCN: buffer_store_dword [[VAL1]], v0, s[0:3], 0 offen offset:4{{$}}
	define void @void_func_sret_struct_i8_i32({ i8, i32 } addrspace(5)* sret %arg0) #0 {			define void @void_func_sret_struct_i8_i32({ i8, i32 } addrspace(5)* sret %arg0) #0 {
	%val0 = load volatile i8, i8 addrspace(1)* undef			%val0 = load volatile i8, i8 addrspace(1)* undef
	%val1 = load volatile i32, i32 addrspace(1)* undef			%val1 = load volatile i32, i32 addrspace(1)* undef
	%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1
	store i8 %val0, i8 addrspace(5)* %gep0			store i8 %val0, i8 addrspace(5)* %gep0
	store i32 %val1, i32 addrspace(5)* %gep1			store i32 %val1, i32 addrspace(5)* %gep1
	ret void			ret void
	}			}

	; FIXME: Should be able to fold offsets in all of these pre-gfx9. Call			; FIXME: Should be able to fold offsets in all of these pre-gfx9. Call
	; lowering introduces an extra CopyToReg/CopyFromReg obscuring the			; lowering introduces an extra CopyToReg/CopyFromReg obscuring the
	; AssertZext inserted. Not using it introduces the spills.			; AssertZext inserted. Not using it introduces the spills.

	; GCN-LABEL: {{^}}v33i32_func_void:			; GCN-LABEL: {{^}}v33i32_func_void:
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:4{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:4{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:8{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:8{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:12{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:12{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:16{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:16{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:20{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:20{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:24{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:24{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:28{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:28{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:32{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:32{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:36{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:36{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:40{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:40{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:44{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:44{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:48{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:48{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:52{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:52{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:56{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:56{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:60{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:60{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:64{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:64{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:68{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:68{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:72{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:72{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:76{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:76{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:80{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:80{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:84{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:84{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:88{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:88{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:92{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:92{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:96{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:96{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:100{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:100{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:104{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:104{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:108{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:108{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:112{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:112{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:116{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:116{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:120{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:120{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:124{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:124{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:128{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:128{{$}}
	; GFX9: s_waitcnt vmcnt(0)			; GFX9: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64			; GFX9-NEXT: s_setpc_b64
	define <33 x i32> @v33i32_func_void() #0 {			define <33 x i32> @v33i32_func_void() #0 {
	%ptr = load volatile <33 x i32> addrspace(1), <33 x i32> addrspace(1) addrspace(4)* undef			%ptr = load volatile <33 x i32> addrspace(1), <33 x i32> addrspace(1) addrspace(4)* undef
	%val = load <33 x i32>, <33 x i32> addrspace(1)* %ptr			%val = load <33 x i32>, <33 x i32> addrspace(1)* %ptr
	ret <33 x i32> %val			ret <33 x i32> %val
	}			}

	; GCN-LABEL: {{^}}struct_v32i32_i32_func_void:			; GCN-LABEL: {{^}}struct_v32i32_i32_func_void:
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:4{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:4{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:8{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:8{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:12{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:12{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:16{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:16{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:20{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:20{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:24{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:24{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:28{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:28{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:32{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:32{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:36{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:36{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:40{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:40{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:44{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:44{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:48{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:48{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:52{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:52{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:56{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:56{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:60{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:60{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:64{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:64{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:68{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:68{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:72{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:72{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:76{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:76{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:80{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:80{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:84{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:84{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:88{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:88{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:92{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:92{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:96{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:96{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:100{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:100{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:104{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:104{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:108{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:108{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:112{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:112{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:116{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:116{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:120{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:120{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:124{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:124{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:128{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:128{{$}}
	; GFX9: s_waitcnt vmcnt(0)			; GFX9: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64			; GFX9-NEXT: s_setpc_b64
	define { <32 x i32>, i32 } @struct_v32i32_i32_func_void() #0 {			define { <32 x i32>, i32 } @struct_v32i32_i32_func_void() #0 {
	%ptr = load volatile { <32 x i32>, i32 } addrspace(1), { <32 x i32>, i32 } addrspace(1) addrspace(4)* undef			%ptr = load volatile { <32 x i32>, i32 } addrspace(1), { <32 x i32>, i32 } addrspace(1) addrspace(4)* undef
	%val = load { <32 x i32>, i32 }, { <32 x i32>, i32 } addrspace(1)* %ptr			%val = load { <32 x i32>, i32 }, { <32 x i32>, i32 } addrspace(1)* %ptr
	ret { <32 x i32>, i32 }%val			ret { <32 x i32>, i32 }%val
	}			}

	; GCN-LABEL: {{^}}struct_i32_v32i32_func_void:			; GCN-LABEL: {{^}}struct_i32_v32i32_func_void:
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:128{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:128{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:132{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:132{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:136{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:136{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:140{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:140{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:144{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:144{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:148{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:148{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:152{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:152{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:156{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:156{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:160{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:160{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:164{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:164{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:168{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:168{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:172{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:172{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:176{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:176{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:180{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:180{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:184{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:184{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:188{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:188{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:192{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:192{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:196{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:196{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:200{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:200{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:204{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:204{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:208{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:208{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:212{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:212{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:216{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:216{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:220{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:220{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:224{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:224{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:228{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:228{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:232{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:232{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:236{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:236{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:240{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:240{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:244{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:244{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:248{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:248{{$}}
	; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], s33 offen offset:252{{$}}			; GFX9-DAG: buffer_store_dword v{{[0-9]+}}, v0, s[0:3], 0 offen offset:252{{$}}
	; GFX9: s_waitcnt vmcnt(0)			; GFX9: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64			; GFX9-NEXT: s_setpc_b64
	define { i32, <32 x i32> } @struct_i32_v32i32_func_void() #0 {			define { i32, <32 x i32> } @struct_i32_v32i32_func_void() #0 {
	%ptr = load volatile { i32, <32 x i32> } addrspace(1), { i32, <32 x i32> } addrspace(1) addrspace(4)* undef			%ptr = load volatile { i32, <32 x i32> } addrspace(1), { i32, <32 x i32> } addrspace(1) addrspace(4)* undef
	%val = load { i32, <32 x i32> }, { i32, <32 x i32> } addrspace(1)* %ptr			%val = load { i32, <32 x i32> }, { i32, <32 x i32> } addrspace(1)* %ptr
	ret { i32, <32 x i32> }%val			ret { i32, <32 x i32> }%val
	}			}

	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	entry:
%a.val = load half, half addrspace(1)* %a		%a.val = load half, half addrspace(1)* %a
%b.val = load half, half addrspace(1)* %b		%b.val = load half, half addrspace(1)* %b
%r.val = fadd half %a.val, %b.val		%r.val = fadd half %a.val, %b.val
store half %r.val, half addrspace(1)* %r		store half %r.val, half addrspace(1)* %r
ret void		ret void
}		}

; CHECK: .name: num_spilled_sgprs		; CHECK: .name: num_spilled_sgprs
; GFX700: .sgpr_spill_count: 40		; GFX700: .sgpr_spill_count: 38
; GFX803: .sgpr_spill_count: 24		; GFX803: .sgpr_spill_count: 22
; GFX900: .sgpr_spill_count: 24		; GFX900: .sgpr_spill_count: 22
; GFX1010: .sgpr_spill_count: 24		; GFX1010: .sgpr_spill_count: 22
; CHECK: .symbol: num_spilled_sgprs.kd		; CHECK: .symbol: num_spilled_sgprs.kd
define amdgpu_kernel void @num_spilled_sgprs(		define amdgpu_kernel void @num_spilled_sgprs(
i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, [8 x i32],		i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, [8 x i32],
i32 addrspace(1)* %out2, i32 addrspace(1)* %out3, [8 x i32],		i32 addrspace(1)* %out2, i32 addrspace(1)* %out3, [8 x i32],
i32 addrspace(1)* %out4, i32 addrspace(1)* %out5, [8 x i32],		i32 addrspace(1)* %out4, i32 addrspace(1)* %out5, [8 x i32],
i32 addrspace(1)* %out6, i32 addrspace(1)* %out7, [8 x i32],		i32 addrspace(1)* %out6, i32 addrspace(1)* %out7, [8 x i32],
i32 addrspace(1)* %out8, i32 addrspace(1)* %out9, [8 x i32],		i32 addrspace(1)* %out8, i32 addrspace(1)* %out9, [8 x i32],
i32 addrspace(1)* %outa, i32 addrspace(1)* %outb, [8 x i32],		i32 addrspace(1)* %outa, i32 addrspace(1)* %outb, [8 x i32],
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props.ll

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	entry:
%r.val = fadd half %a.val, %b.val		%r.val = fadd half %a.val, %b.val
store half %r.val, half addrspace(1)* %r		store half %r.val, half addrspace(1)* %r
ret void		ret void
}		}

; CHECK-LABEL: - Name: num_spilled_sgprs		; CHECK-LABEL: - Name: num_spilled_sgprs
; CHECK: SymbolName: 'num_spilled_sgprs@kd'		; CHECK: SymbolName: 'num_spilled_sgprs@kd'
; CHECK: CodeProps:		; CHECK: CodeProps:
; GFX700: NumSpilledSGPRs: 40		; GFX700: NumSpilledSGPRs: 38
; GFX803: NumSpilledSGPRs: 24		; GFX803: NumSpilledSGPRs: 22
; GFX900: NumSpilledSGPRs: 24		; GFX900: NumSpilledSGPRs: 22
define amdgpu_kernel void @num_spilled_sgprs(		define amdgpu_kernel void @num_spilled_sgprs(
i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, [8 x i32],		i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, [8 x i32],
i32 addrspace(1)* %out2, i32 addrspace(1)* %out3, [8 x i32],		i32 addrspace(1)* %out2, i32 addrspace(1)* %out3, [8 x i32],
i32 addrspace(1)* %out4, i32 addrspace(1)* %out5, [8 x i32],		i32 addrspace(1)* %out4, i32 addrspace(1)* %out5, [8 x i32],
i32 addrspace(1)* %out6, i32 addrspace(1)* %out7, [8 x i32],		i32 addrspace(1)* %out6, i32 addrspace(1)* %out7, [8 x i32],
i32 addrspace(1)* %out8, i32 addrspace(1)* %out9, [8 x i32],		i32 addrspace(1)* %out8, i32 addrspace(1)* %out9, [8 x i32],
i32 addrspace(1)* %outa, i32 addrspace(1)* %outb, [8 x i32],		i32 addrspace(1)* %outa, i32 addrspace(1)* %outb, [8 x i32],
i32 addrspace(1)* %outc, i32 addrspace(1)* %outd, [8 x i32],		i32 addrspace(1)* %outc, i32 addrspace(1)* %outd, [8 x i32],
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot8s.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s

define amdgpu_kernel void @idot8_acc32(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc32(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc32:		; GFX7-LABEL: idot8_acc32:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s2, s0, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s7, s5, 0x40000
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s9, s5, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v0, s8		; GFX7-NEXT: v_mov_b32_e32 v0, s7
; GFX7-NEXT: v_mov_b32_e32 v1, s21
; GFX7-NEXT: v_mad_i32_i24 v0, s2, v0, v1
; GFX7-NEXT: s_bfe_i32 s9, s0, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v1, s10
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008
; GFX7-NEXT: v_mad_i32_i24 v0, s9, v1, v0
; GFX7-NEXT: s_bfe_i32 s11, s0, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v1, s12
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x4000c
; GFX7-NEXT: v_mad_i32_i24 v0, s11, v1, v0
; GFX7-NEXT: s_bfe_i32 s13, s0, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v1, s14
; GFX7-NEXT: s_bfe_i32 s16, s1, 0x40010
; GFX7-NEXT: v_mad_i32_i24 v0, s13, v1, v0
; GFX7-NEXT: s_bfe_i32 s15, s0, 0x40010
; GFX7-NEXT: v_mov_b32_e32 v1, s16
; GFX7-NEXT: s_bfe_i32 s18, s1, 0x40014
; GFX7-NEXT: s_bfe_i32 s20, s1, 0x40018
; GFX7-NEXT: v_mad_i32_i24 v0, s15, v1, v0
; GFX7-NEXT: s_bfe_i32 s17, s0, 0x40014
; GFX7-NEXT: v_mov_b32_e32 v1, s18
; GFX7-NEXT: s_bfe_i32 s19, s0, 0x40018
; GFX7-NEXT: v_mad_i32_i24 v0, s17, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s20		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: s_ashr_i32 s1, s1, 28		; GFX7-NEXT: v_mad_i32_i24 v0, s6, v0, v1
; GFX7-NEXT: v_mad_i32_i24 v0, s19, v1, v0		; GFX7-NEXT: s_bfe_i32 s8, s4, 0x40004
; GFX7-NEXT: s_ashr_i32 s0, s0, 28		; GFX7-NEXT: v_mov_b32_e32 v1, s9
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: s_bfe_i32 s11, s5, 0x40008
; GFX7-NEXT: v_mad_i32_i24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s8, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: s_bfe_i32 s10, s4, 0x40008
		; GFX7-NEXT: v_mov_b32_e32 v1, s11
		; GFX7-NEXT: s_bfe_i32 s13, s5, 0x4000c
		; GFX7-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX7-NEXT: s_bfe_i32 s12, s4, 0x4000c
		; GFX7-NEXT: v_mov_b32_e32 v1, s13
		; GFX7-NEXT: s_bfe_i32 s15, s5, 0x40010
		; GFX7-NEXT: v_mad_i32_i24 v0, s12, v1, v0
		; GFX7-NEXT: s_bfe_i32 s14, s4, 0x40010
		; GFX7-NEXT: v_mov_b32_e32 v1, s15
		; GFX7-NEXT: s_bfe_i32 s17, s5, 0x40014
		; GFX7-NEXT: s_bfe_i32 s19, s5, 0x40018
		; GFX7-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX7-NEXT: s_bfe_i32 s16, s4, 0x40014
		; GFX7-NEXT: v_mov_b32_e32 v1, s17
		; GFX7-NEXT: s_bfe_i32 s18, s4, 0x40018
		; GFX7-NEXT: v_mad_i32_i24 v0, s16, v1, v0
		; GFX7-NEXT: v_mov_b32_e32 v1, s19
		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
		; GFX7-NEXT: v_mad_i32_i24 v0, s18, v1, v0
		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
		; GFX7-NEXT: v_mov_b32_e32 v1, s5
		; GFX7-NEXT: v_mad_i32_i24 v0, s4, v1, v0
		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc32:		; GFX8-LABEL: idot8_acc32:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX8-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX8-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX8-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX8-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX8-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s5
; GFX8-NEXT: v_mov_b32_e32 v1, s19
; GFX8-NEXT: v_mad_i32_i24 v0, s5, v0, v1
; GFX8-NEXT: s_bfe_i32 s7, s2, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v1, s8
; GFX8-NEXT: s_bfe_i32 s10, s4, 0x40008
; GFX8-NEXT: v_mad_i32_i24 v0, s7, v1, v0
; GFX8-NEXT: s_bfe_i32 s9, s2, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v1, s10
; GFX8-NEXT: s_bfe_i32 s12, s4, 0x4000c
; GFX8-NEXT: v_mad_i32_i24 v0, s9, v1, v0
; GFX8-NEXT: s_bfe_i32 s11, s2, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v1, s12
; GFX8-NEXT: s_bfe_i32 s14, s4, 0x40010
; GFX8-NEXT: v_mad_i32_i24 v0, s11, v1, v0
; GFX8-NEXT: s_bfe_i32 s13, s2, 0x40010
; GFX8-NEXT: v_mov_b32_e32 v1, s14
; GFX8-NEXT: s_bfe_i32 s16, s4, 0x40014
; GFX8-NEXT: s_bfe_i32 s18, s4, 0x40018
; GFX8-NEXT: v_mad_i32_i24 v0, s13, v1, v0
; GFX8-NEXT: s_bfe_i32 s15, s2, 0x40014
; GFX8-NEXT: v_mov_b32_e32 v1, s16
; GFX8-NEXT: s_bfe_i32 s17, s2, 0x40018
; GFX8-NEXT: v_mad_i32_i24 v0, s15, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s18		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: s_ashr_i32 s4, s4, 28		; GFX8-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX8-NEXT: v_mad_i32_i24 v0, s17, v1, v0		; GFX8-NEXT: s_bfe_i32 s6, s2, 0x40004
		; GFX8-NEXT: v_mov_b32_e32 v1, s7
		; GFX8-NEXT: s_bfe_i32 s9, s3, 0x40008
		; GFX8-NEXT: v_mad_i32_i24 v0, s6, v1, v0
		; GFX8-NEXT: s_bfe_i32 s8, s2, 0x40008
		; GFX8-NEXT: v_mov_b32_e32 v1, s9
		; GFX8-NEXT: s_bfe_i32 s11, s3, 0x4000c
		; GFX8-NEXT: v_mad_i32_i24 v0, s8, v1, v0
		; GFX8-NEXT: s_bfe_i32 s10, s2, 0x4000c
		; GFX8-NEXT: v_mov_b32_e32 v1, s11
		; GFX8-NEXT: s_bfe_i32 s13, s3, 0x40010
		; GFX8-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX8-NEXT: s_bfe_i32 s12, s2, 0x40010
		; GFX8-NEXT: v_mov_b32_e32 v1, s13
		; GFX8-NEXT: s_bfe_i32 s15, s3, 0x40014
		; GFX8-NEXT: s_bfe_i32 s17, s3, 0x40018
		; GFX8-NEXT: v_mad_i32_i24 v0, s12, v1, v0
		; GFX8-NEXT: s_bfe_i32 s14, s2, 0x40014
		; GFX8-NEXT: v_mov_b32_e32 v1, s15
		; GFX8-NEXT: s_bfe_i32 s16, s2, 0x40018
		; GFX8-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX8-NEXT: v_mov_b32_e32 v1, s17
		; GFX8-NEXT: s_ashr_i32 s3, s3, 28
		; GFX8-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX8-NEXT: s_ashr_i32 s2, s2, 28		; GFX8-NEXT: s_ashr_i32 s2, s2, 28
; GFX8-NEXT: v_mov_b32_e32 v1, s4		; GFX8-NEXT: v_mov_b32_e32 v1, s3
; GFX8-NEXT: v_mad_i32_i24 v2, s2, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v2, s2, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc32:		; GFX9-LABEL: idot8_acc32:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX9-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX9-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX9-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX9-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX9-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s5
; GFX9-NEXT: v_mov_b32_e32 v1, s19
; GFX9-NEXT: v_mad_i32_i24 v0, s5, v0, v1
; GFX9-NEXT: s_bfe_i32 s7, s2, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v1, s8
; GFX9-NEXT: s_bfe_i32 s10, s4, 0x40008
; GFX9-NEXT: v_mad_i32_i24 v0, s7, v1, v0
; GFX9-NEXT: s_bfe_i32 s9, s2, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v1, s10
; GFX9-NEXT: s_bfe_i32 s12, s4, 0x4000c
; GFX9-NEXT: v_mad_i32_i24 v0, s9, v1, v0
; GFX9-NEXT: s_bfe_i32 s11, s2, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v1, s12
; GFX9-NEXT: s_bfe_i32 s14, s4, 0x40010
; GFX9-NEXT: v_mad_i32_i24 v0, s11, v1, v0
; GFX9-NEXT: s_bfe_i32 s13, s2, 0x40010
; GFX9-NEXT: v_mov_b32_e32 v1, s14
; GFX9-NEXT: s_bfe_i32 s16, s4, 0x40014
; GFX9-NEXT: s_bfe_i32 s18, s4, 0x40018
; GFX9-NEXT: v_mad_i32_i24 v0, s13, v1, v0
; GFX9-NEXT: s_bfe_i32 s15, s2, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v1, s16
; GFX9-NEXT: s_bfe_i32 s17, s2, 0x40018
; GFX9-NEXT: v_mad_i32_i24 v0, s15, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s18		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: s_ashr_i32 s4, s4, 28		; GFX9-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX9-NEXT: v_mad_i32_i24 v0, s17, v1, v0		; GFX9-NEXT: s_bfe_i32 s6, s2, 0x40004
		; GFX9-NEXT: v_mov_b32_e32 v1, s7
		; GFX9-NEXT: s_bfe_i32 s9, s3, 0x40008
		; GFX9-NEXT: v_mad_i32_i24 v0, s6, v1, v0
		; GFX9-NEXT: s_bfe_i32 s8, s2, 0x40008
		; GFX9-NEXT: v_mov_b32_e32 v1, s9
		; GFX9-NEXT: s_bfe_i32 s11, s3, 0x4000c
		; GFX9-NEXT: v_mad_i32_i24 v0, s8, v1, v0
		; GFX9-NEXT: s_bfe_i32 s10, s2, 0x4000c
		; GFX9-NEXT: v_mov_b32_e32 v1, s11
		; GFX9-NEXT: s_bfe_i32 s13, s3, 0x40010
		; GFX9-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX9-NEXT: s_bfe_i32 s12, s2, 0x40010
		; GFX9-NEXT: v_mov_b32_e32 v1, s13
		; GFX9-NEXT: s_bfe_i32 s15, s3, 0x40014
		; GFX9-NEXT: s_bfe_i32 s17, s3, 0x40018
		; GFX9-NEXT: v_mad_i32_i24 v0, s12, v1, v0
		; GFX9-NEXT: s_bfe_i32 s14, s2, 0x40014
		; GFX9-NEXT: v_mov_b32_e32 v1, s15
		; GFX9-NEXT: s_bfe_i32 s16, s2, 0x40018
		; GFX9-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX9-NEXT: v_mov_b32_e32 v1, s17
		; GFX9-NEXT: s_ashr_i32 s3, s3, 28
		; GFX9-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX9-NEXT: s_ashr_i32 s2, s2, 28		; GFX9-NEXT: s_ashr_i32 s2, s2, 28
; GFX9-NEXT: v_mov_b32_e32 v1, s4		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_mad_i32_i24 v2, s2, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v2, s2, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc32:		; GFX9-DL-LABEL: idot8_acc32:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s6, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX9-DL-NEXT: v_dot8_i32_i4 v2, s4, v0, v1		; GFX9-DL-NEXT: v_dot8_i32_i4 v2, s4, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc32:		; GFX10-DL-LABEL: idot8_acc32:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX10-DL-NEXT: v_dot8_i32_i4 v2, s1, s2, v0		; GFX10-DL-NEXT: v_dot8_i32_i4 v2, s0, s1, v0
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s8		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s9		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Once the unnecessary zero extentions of the elements are removed;		; TODO: Once the unnecessary zero extentions of the elements are removed;
; pattern recognizer will kick in.		; pattern recognizer will kick in.
define amdgpu_kernel void @idot8_acc16(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc16(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc16:		; GFX7-LABEL: idot8_acc16:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_mov_b32 s0, 0xffff		; GFX7-NEXT: s_mov_b32 s8, 0xffff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s2, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000		; GFX7-NEXT: s_bfe_i32 s7, s5, 0x40000
; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40004		; GFX7-NEXT: s_bfe_i32 s10, s5, 0x40004
; GFX7-NEXT: s_and_b32 s9, s9, s0		; GFX7-NEXT: s_and_b32 s7, s7, s8
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s9, s4, 0x40004
; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40008		; GFX7-NEXT: s_bfe_i32 s12, s5, 0x40008
; GFX7-NEXT: s_and_b32 s11, s11, s0		; GFX7-NEXT: s_and_b32 s10, s10, s8
; GFX7-NEXT: s_and_b32 s8, s8, s0		; GFX7-NEXT: s_and_b32 s6, s6, s8
; GFX7-NEXT: v_mov_b32_e32 v1, s9		; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008		; GFX7-NEXT: s_bfe_i32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_i32 s15, s2, 0x4000c		; GFX7-NEXT: s_bfe_i32 s14, s5, 0x4000c
; GFX7-NEXT: s_and_b32 s13, s13, s0		; GFX7-NEXT: s_and_b32 s12, s12, s8
; GFX7-NEXT: s_and_b32 s10, s10, s0		; GFX7-NEXT: s_and_b32 s9, s9, s8
; GFX7-NEXT: v_mov_b32_e32 v2, s11		; GFX7-NEXT: v_mov_b32_e32 v2, s10
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x4000c		; GFX7-NEXT: s_bfe_i32 s13, s4, 0x4000c
; GFX7-NEXT: s_bfe_i32 s17, s2, 0x40010		; GFX7-NEXT: s_bfe_i32 s16, s5, 0x40010
; GFX7-NEXT: s_and_b32 s15, s15, s0		; GFX7-NEXT: s_and_b32 s14, s14, s8
; GFX7-NEXT: s_and_b32 s12, s12, s0		; GFX7-NEXT: s_and_b32 s11, s11, s8
; GFX7-NEXT: v_mov_b32_e32 v3, s13		; GFX7-NEXT: v_mov_b32_e32 v3, s12
; GFX7-NEXT: s_bfe_i32 s16, s1, 0x40010		; GFX7-NEXT: s_bfe_i32 s15, s4, 0x40010
; GFX7-NEXT: s_bfe_i32 s19, s2, 0x40014		; GFX7-NEXT: s_bfe_i32 s18, s5, 0x40014
; GFX7-NEXT: s_and_b32 s17, s17, s0		; GFX7-NEXT: s_and_b32 s16, s16, s8
; GFX7-NEXT: s_and_b32 s14, s14, s0		; GFX7-NEXT: s_and_b32 s13, s13, s8
; GFX7-NEXT: v_mov_b32_e32 v4, s15		; GFX7-NEXT: v_mov_b32_e32 v4, s14
; GFX7-NEXT: s_bfe_i32 s21, s2, 0x40018		; GFX7-NEXT: s_bfe_i32 s20, s5, 0x40018
; GFX7-NEXT: s_bfe_i32 s18, s1, 0x40014		; GFX7-NEXT: s_bfe_i32 s17, s4, 0x40014
; GFX7-NEXT: s_and_b32 s19, s19, s0		; GFX7-NEXT: s_and_b32 s18, s18, s8
; GFX7-NEXT: s_and_b32 s16, s16, s0		; GFX7-NEXT: s_and_b32 s15, s15, s8
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: s_bfe_i32 s20, s1, 0x40018		; GFX7-NEXT: s_bfe_i32 s19, s4, 0x40018
; GFX7-NEXT: s_ashr_i32 s2, s2, 28		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
; GFX7-NEXT: s_and_b32 s21, s21, s0		; GFX7-NEXT: s_and_b32 s20, s20, s8
; GFX7-NEXT: s_and_b32 s18, s18, s0		; GFX7-NEXT: s_and_b32 s17, s17, s8
; GFX7-NEXT: v_mov_b32_e32 v6, s19		; GFX7-NEXT: v_mov_b32_e32 v6, s18
; GFX7-NEXT: s_ashr_i32 s1, s1, 28		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
; GFX7-NEXT: s_and_b32 s20, s20, s0		; GFX7-NEXT: s_and_b32 s19, s19, s8
; GFX7-NEXT: s_and_b32 s2, s2, s0		; GFX7-NEXT: s_and_b32 s5, s5, s8
; GFX7-NEXT: v_mov_b32_e32 v7, s21		; GFX7-NEXT: v_mov_b32_e32 v7, s20
; GFX7-NEXT: s_and_b32 s0, s1, s0		; GFX7-NEXT: s_and_b32 s4, s4, s8
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s14, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s13, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s16, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s15, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s18, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s17, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s20, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s19, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s2		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc16:		; GFX8-LABEL: idot8_acc16:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ushort v2, v[0:1]		; GFX8-NEXT: flat_load_ushort v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX8-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX8-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX8-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX8-NEXT: s_bfe_i32 s8, s1, 0x40004		; GFX8-NEXT: s_bfe_i32 s7, s1, 0x40004
; GFX8-NEXT: s_bfe_i32 s10, s1, 0x40008		; GFX8-NEXT: s_bfe_i32 s9, s1, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v6, s6		; GFX8-NEXT: v_mov_b32_e32 v6, s5
; GFX8-NEXT: s_lshr_b32 s2, s0, 12		; GFX8-NEXT: s_lshr_b32 s2, s0, 12
; GFX8-NEXT: s_lshr_b32 s4, s1, 12		; GFX8-NEXT: s_lshr_b32 s3, s1, 12
; GFX8-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX8-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX8-NEXT: s_bfe_i32 s9, s0, 0x40008		; GFX8-NEXT: s_bfe_i32 s8, s0, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mov_b32_e32 v7, s8		; GFX8-NEXT: v_mov_b32_e32 v7, s7
; GFX8-NEXT: v_lshlrev_b16_e64 v4, 12, s2		; GFX8-NEXT: v_lshlrev_b16_e64 v4, 12, s2
; GFX8-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX8-NEXT: v_lshlrev_b16_e64 v5, 12, s3
; GFX8-NEXT: v_mul_i32_i24_e32 v3, s9, v3		; GFX8-NEXT: v_mul_i32_i24_e32 v3, s8, v3
; GFX8-NEXT: s_bfe_i32 s12, s1, 0x40010		; GFX8-NEXT: s_bfe_i32 s11, s1, 0x40010
; GFX8-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX8-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX8-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX8-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX8-NEXT: s_bfe_i32 s14, s1, 0x40014		; GFX8-NEXT: s_bfe_i32 s13, s1, 0x40014
; GFX8-NEXT: s_bfe_i32 s11, s0, 0x40010		; GFX8-NEXT: s_bfe_i32 s10, s0, 0x40010
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: s_bfe_i32 s16, s1, 0x40018		; GFX8-NEXT: s_bfe_i32 s15, s1, 0x40018
; GFX8-NEXT: s_bfe_i32 s13, s0, 0x40014		; GFX8-NEXT: s_bfe_i32 s12, s0, 0x40014
; GFX8-NEXT: v_mov_b32_e32 v9, s14		; GFX8-NEXT: v_mov_b32_e32 v9, s13
; GFX8-NEXT: s_bfe_i32 s15, s0, 0x40018		; GFX8-NEXT: s_bfe_i32 s14, s0, 0x40018
; GFX8-NEXT: s_ashr_i32 s1, s1, 28		; GFX8-NEXT: s_ashr_i32 s1, s1, 28
; GFX8-NEXT: v_mov_b32_e32 v10, s16		; GFX8-NEXT: v_mov_b32_e32 v10, s15
; GFX8-NEXT: s_ashr_i32 s0, s0, 28		; GFX8-NEXT: s_ashr_i32 s0, s0, 28
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s4, v6, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s7, v7, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s6, v7, v2
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX8-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s11, v8, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s10, v8, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s13, v9, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s12, v9, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s15, v10, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s14, v10, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX8-NEXT: flat_store_short v[0:1], v2		; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc16:		; GFX9-LABEL: idot8_acc16:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ushort v2, v[0:1], off		; GFX9-NEXT: global_load_ushort v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX9-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX9-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX9-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX9-NEXT: s_bfe_i32 s8, s1, 0x40004		; GFX9-NEXT: s_bfe_i32 s7, s1, 0x40004
; GFX9-NEXT: s_bfe_i32 s10, s1, 0x40008		; GFX9-NEXT: s_bfe_i32 s9, s1, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v6, s6		; GFX9-NEXT: v_mov_b32_e32 v6, s5
; GFX9-NEXT: s_lshr_b32 s2, s0, 12		; GFX9-NEXT: s_lshr_b32 s2, s0, 12
; GFX9-NEXT: s_lshr_b32 s4, s1, 12		; GFX9-NEXT: s_lshr_b32 s3, s1, 12
; GFX9-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX9-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX9-NEXT: s_bfe_i32 s9, s0, 0x40008		; GFX9-NEXT: s_bfe_i32 s8, s0, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mov_b32_e32 v7, s8		; GFX9-NEXT: v_mov_b32_e32 v7, s7
; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s2		; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s2
; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s3
; GFX9-NEXT: v_mul_i32_i24_e32 v3, s9, v3		; GFX9-NEXT: v_mul_i32_i24_e32 v3, s8, v3
; GFX9-NEXT: s_bfe_i32 s12, s1, 0x40010		; GFX9-NEXT: s_bfe_i32 s11, s1, 0x40010
; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-NEXT: s_bfe_i32 s14, s1, 0x40014		; GFX9-NEXT: s_bfe_i32 s13, s1, 0x40014
; GFX9-NEXT: s_bfe_i32 s11, s0, 0x40010		; GFX9-NEXT: s_bfe_i32 s10, s0, 0x40010
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: s_bfe_i32 s16, s1, 0x40018		; GFX9-NEXT: s_bfe_i32 s15, s1, 0x40018
; GFX9-NEXT: s_bfe_i32 s13, s0, 0x40014		; GFX9-NEXT: s_bfe_i32 s12, s0, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v9, s14		; GFX9-NEXT: v_mov_b32_e32 v9, s13
; GFX9-NEXT: s_bfe_i32 s15, s0, 0x40018		; GFX9-NEXT: s_bfe_i32 s14, s0, 0x40018
; GFX9-NEXT: s_ashr_i32 s1, s1, 28		; GFX9-NEXT: s_ashr_i32 s1, s1, 28
; GFX9-NEXT: v_mov_b32_e32 v10, s16		; GFX9-NEXT: v_mov_b32_e32 v10, s15
; GFX9-NEXT: s_ashr_i32 s0, s0, 28		; GFX9-NEXT: s_ashr_i32 s0, s0, 28
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s4, v6, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s7, v7, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s6, v7, v2
; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX9-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s11, v8, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s10, v8, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s13, v9, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s12, v9, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s15, v10, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s14, v10, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX9-NEXT: global_store_short v[0:1], v2, off		; GFX9-NEXT: global_store_short v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc16:		; GFX9-DL-LABEL: idot8_acc16:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX9-DL-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX9-DL-NEXT: s_bfe_i32 s8, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s7, s1, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s10, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s9, s1, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s5
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 12		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 12
; GFX9-DL-NEXT: s_lshr_b32 s4, s1, 12		; GFX9-DL-NEXT: s_lshr_b32 s3, s1, 12
; GFX9-DL-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s9, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s8, s0, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s8		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s7
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s2		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s2
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s3
; GFX9-DL-NEXT: v_mul_i32_i24_e32 v3, s9, v3		; GFX9-DL-NEXT: v_mul_i32_i24_e32 v3, s8, v3
; GFX9-DL-NEXT: s_bfe_i32 s12, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s11, s1, 0x40010
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-DL-NEXT: s_bfe_i32 s14, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s13, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_i32 s11, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s10, s0, 0x40010
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: s_bfe_i32 s16, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s15, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_i32 s13, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s12, s0, 0x40014
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s13
; GFX9-DL-NEXT: s_bfe_i32 s15, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s14, s0, 0x40018
; GFX9-DL-NEXT: s_ashr_i32 s1, s1, 28		; GFX9-DL-NEXT: s_ashr_i32 s1, s1, 28
; GFX9-DL-NEXT: v_mov_b32_e32 v10, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v10, s15
; GFX9-DL-NEXT: s_ashr_i32 s0, s0, 28		; GFX9-DL-NEXT: s_ashr_i32 s0, s0, 28
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s4, v6, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s7, v7, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v7, v2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX9-DL-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s11, v8, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s10, v8, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s13, v9, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s12, v9, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s15, v10, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s14, v10, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX9-DL-NEXT: global_store_short v[0:1], v2, off		; GFX9-DL-NEXT: global_store_short v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc16:		; GFX10-DL-LABEL: idot8_acc16:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 12		; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 12
; GFX10-DL-NEXT: s_lshr_b32 s4, s1, 12		; GFX10-DL-NEXT: s_lshr_b32 s3, s1, 12
; GFX10-DL-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s2		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s2
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s4		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s3
; GFX10-DL-NEXT: s_bfe_i32 s8, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s7, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s9, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s8, s1, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s2, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s2, s1, 0x40004
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3
; GFX10-DL-NEXT: s_mov_b32 s4, 0xffff		; GFX10-DL-NEXT: s_mov_b32 s3, 0xffff
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4
; GFX10-DL-NEXT: v_mul_i32_i24_e64 v5, s8, s9		; GFX10-DL-NEXT: v_mul_i32_i24_e64 v5, s7, s8
; GFX10-DL-NEXT: v_and_b32_e32 v3, s4, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s3, v3
; GFX10-DL-NEXT: v_and_b32_e32 v4, s4, v4		; GFX10-DL-NEXT: v_and_b32_e32 v4, s3, v4
; GFX10-DL-NEXT: s_bfe_i32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_i32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_i32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s7, s2, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s6, s2, v2
; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40010
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX10-DL-NEXT: v_mad_u32_u24 v2, v3, v4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, v3, v4, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_i32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_ashr_i32 s0, s0, 28		; GFX10-DL-NEXT: s_ashr_i32 s0, s0, 28
; GFX10-DL-NEXT: s_ashr_i32 s1, s1, 28		; GFX10-DL-NEXT: s_ashr_i32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s0, s1, v2
; GFX10-DL-NEXT: global_store_short v[0:1], v2, off		; GFX10-DL-NEXT: global_store_short v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i16 addrspace(1)* nocapture %dst) {		i16 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	entry:
store i16 %add8, i16 addrspace(1)* %dst, align 4		store i16 %add8, i16 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Support this pattern.		; TODO: Support this pattern.
define amdgpu_kernel void @idot8_acc8(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc8(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc8:		; GFX7-LABEL: idot8_acc8:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_movk_i32 s0, 0xff		; GFX7-NEXT: s_movk_i32 s8, 0xff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s2, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000		; GFX7-NEXT: s_bfe_i32 s7, s5, 0x40000
; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40004		; GFX7-NEXT: s_bfe_i32 s10, s5, 0x40004
; GFX7-NEXT: s_and_b32 s9, s9, s0		; GFX7-NEXT: s_and_b32 s7, s7, s8
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s9, s4, 0x40004
; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40008		; GFX7-NEXT: s_bfe_i32 s12, s5, 0x40008
; GFX7-NEXT: s_and_b32 s11, s11, s0		; GFX7-NEXT: s_and_b32 s10, s10, s8
; GFX7-NEXT: s_and_b32 s8, s8, s0		; GFX7-NEXT: s_and_b32 s6, s6, s8
; GFX7-NEXT: v_mov_b32_e32 v1, s9		; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008		; GFX7-NEXT: s_bfe_i32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_i32 s15, s2, 0x4000c		; GFX7-NEXT: s_bfe_i32 s14, s5, 0x4000c
; GFX7-NEXT: s_and_b32 s13, s13, s0		; GFX7-NEXT: s_and_b32 s12, s12, s8
; GFX7-NEXT: s_and_b32 s10, s10, s0		; GFX7-NEXT: s_and_b32 s9, s9, s8
; GFX7-NEXT: v_mov_b32_e32 v2, s11		; GFX7-NEXT: v_mov_b32_e32 v2, s10
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x4000c		; GFX7-NEXT: s_bfe_i32 s13, s4, 0x4000c
; GFX7-NEXT: s_bfe_i32 s17, s2, 0x40010		; GFX7-NEXT: s_bfe_i32 s16, s5, 0x40010
; GFX7-NEXT: s_and_b32 s15, s15, s0		; GFX7-NEXT: s_and_b32 s14, s14, s8
; GFX7-NEXT: s_and_b32 s12, s12, s0		; GFX7-NEXT: s_and_b32 s11, s11, s8
; GFX7-NEXT: v_mov_b32_e32 v3, s13		; GFX7-NEXT: v_mov_b32_e32 v3, s12
; GFX7-NEXT: s_bfe_i32 s16, s1, 0x40010		; GFX7-NEXT: s_bfe_i32 s15, s4, 0x40010
; GFX7-NEXT: s_bfe_i32 s19, s2, 0x40014		; GFX7-NEXT: s_bfe_i32 s18, s5, 0x40014
; GFX7-NEXT: s_and_b32 s17, s17, s0		; GFX7-NEXT: s_and_b32 s16, s16, s8
; GFX7-NEXT: s_and_b32 s14, s14, s0		; GFX7-NEXT: s_and_b32 s13, s13, s8
; GFX7-NEXT: v_mov_b32_e32 v4, s15		; GFX7-NEXT: v_mov_b32_e32 v4, s14
; GFX7-NEXT: s_bfe_i32 s21, s2, 0x40018		; GFX7-NEXT: s_bfe_i32 s20, s5, 0x40018
; GFX7-NEXT: s_bfe_i32 s18, s1, 0x40014		; GFX7-NEXT: s_bfe_i32 s17, s4, 0x40014
; GFX7-NEXT: s_and_b32 s19, s19, s0		; GFX7-NEXT: s_and_b32 s18, s18, s8
; GFX7-NEXT: s_and_b32 s16, s16, s0		; GFX7-NEXT: s_and_b32 s15, s15, s8
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: s_bfe_i32 s20, s1, 0x40018		; GFX7-NEXT: s_bfe_i32 s19, s4, 0x40018
; GFX7-NEXT: s_ashr_i32 s2, s2, 28		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
; GFX7-NEXT: s_and_b32 s21, s21, s0		; GFX7-NEXT: s_and_b32 s20, s20, s8
; GFX7-NEXT: s_and_b32 s18, s18, s0		; GFX7-NEXT: s_and_b32 s17, s17, s8
; GFX7-NEXT: v_mov_b32_e32 v6, s19		; GFX7-NEXT: v_mov_b32_e32 v6, s18
; GFX7-NEXT: s_ashr_i32 s1, s1, 28		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
; GFX7-NEXT: s_and_b32 s20, s20, s0		; GFX7-NEXT: s_and_b32 s19, s19, s8
; GFX7-NEXT: s_and_b32 s2, s2, s0		; GFX7-NEXT: s_and_b32 s5, s5, s8
; GFX7-NEXT: v_mov_b32_e32 v7, s21		; GFX7-NEXT: v_mov_b32_e32 v7, s20
; GFX7-NEXT: s_and_b32 s0, s1, s0		; GFX7-NEXT: s_and_b32 s4, s4, s8
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s14, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s13, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s16, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s15, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s18, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s17, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s20, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s19, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s2		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc8:		; GFX8-LABEL: idot8_acc8:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_movk_i32 s2, 0xff		; GFX8-NEXT: s_movk_i32 s2, 0xff
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_i32 s7, s6, 0x40000		; GFX8-NEXT: s_bfe_i32 s6, s3, 0x40000
; GFX8-NEXT: s_lshr_b32 s4, s6, 12		; GFX8-NEXT: s_lshr_b32 s4, s3, 12
; GFX8-NEXT: s_bfe_i32 s9, s6, 0x40004		; GFX8-NEXT: s_bfe_i32 s8, s3, 0x40004
; GFX8-NEXT: s_bfe_i32 s11, s6, 0x40008		; GFX8-NEXT: s_bfe_i32 s10, s3, 0x40008
; GFX8-NEXT: s_lshr_b32 s1, s0, 12		; GFX8-NEXT: s_lshr_b32 s1, s0, 12
; GFX8-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX8-NEXT: s_bfe_i32 s5, s0, 0x40000
; GFX8-NEXT: v_mov_b32_e32 v6, s7		; GFX8-NEXT: v_mov_b32_e32 v6, s6
; GFX8-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX8-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX8-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX8-NEXT: v_lshlrev_b16_e64 v5, 12, s4
; GFX8-NEXT: s_bfe_i32 s8, s0, 0x40004		; GFX8-NEXT: s_bfe_i32 s7, s0, 0x40004
; GFX8-NEXT: s_bfe_i32 s10, s0, 0x40008		; GFX8-NEXT: s_bfe_i32 s9, s0, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v3, s11		; GFX8-NEXT: v_mov_b32_e32 v3, s10
; GFX8-NEXT: v_mov_b32_e32 v7, s9		; GFX8-NEXT: v_mov_b32_e32 v7, s8
; GFX8-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX8-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX8-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX8-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX8-NEXT: v_mul_i32_i24_e32 v3, s10, v3		; GFX8-NEXT: v_mul_i32_i24_e32 v3, s9, v3
; GFX8-NEXT: s_bfe_i32 s13, s6, 0x40010		; GFX8-NEXT: s_bfe_i32 s12, s3, 0x40010
; GFX8-NEXT: v_and_b32_e32 v4, s2, v4		; GFX8-NEXT: v_and_b32_e32 v4, s2, v4
; GFX8-NEXT: v_and_b32_e32 v5, s2, v5		; GFX8-NEXT: v_and_b32_e32 v5, s2, v5
; GFX8-NEXT: s_bfe_i32 s15, s6, 0x40014		; GFX8-NEXT: s_bfe_i32 s14, s3, 0x40014
; GFX8-NEXT: s_bfe_i32 s12, s0, 0x40010		; GFX8-NEXT: s_bfe_i32 s11, s0, 0x40010
; GFX8-NEXT: v_mov_b32_e32 v8, s13		; GFX8-NEXT: v_mov_b32_e32 v8, s12
; GFX8-NEXT: s_bfe_i32 s17, s6, 0x40018		; GFX8-NEXT: s_bfe_i32 s16, s3, 0x40018
; GFX8-NEXT: s_bfe_i32 s14, s0, 0x40014		; GFX8-NEXT: s_bfe_i32 s13, s0, 0x40014
; GFX8-NEXT: v_mov_b32_e32 v9, s15		; GFX8-NEXT: v_mov_b32_e32 v9, s14
; GFX8-NEXT: s_bfe_i32 s16, s0, 0x40018		; GFX8-NEXT: s_bfe_i32 s15, s0, 0x40018
; GFX8-NEXT: s_ashr_i32 s6, s6, 28		; GFX8-NEXT: s_ashr_i32 s3, s3, 28
; GFX8-NEXT: v_mov_b32_e32 v10, s17		; GFX8-NEXT: v_mov_b32_e32 v10, s16
; GFX8-NEXT: s_ashr_i32 s0, s0, 28		; GFX8-NEXT: s_ashr_i32 s0, s0, 28
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s5, v6, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s8, v7, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s7, v7, v2
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
; GFX8-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s12, v8, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s11, v8, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s14, v9, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s13, v9, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s16, v10, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s15, v10, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s6		; GFX8-NEXT: v_mov_b32_e32 v3, s3
; GFX8-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc8:		; GFX9-LABEL: idot8_acc8:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_movk_i32 s2, 0xff		; GFX9-NEXT: s_movk_i32 s2, 0xff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_i32 s7, s6, 0x40000		; GFX9-NEXT: s_bfe_i32 s6, s3, 0x40000
; GFX9-NEXT: s_lshr_b32 s4, s6, 12		; GFX9-NEXT: s_lshr_b32 s4, s3, 12
; GFX9-NEXT: s_bfe_i32 s9, s6, 0x40004		; GFX9-NEXT: s_bfe_i32 s8, s3, 0x40004
; GFX9-NEXT: s_bfe_i32 s11, s6, 0x40008		; GFX9-NEXT: s_bfe_i32 s10, s3, 0x40008
; GFX9-NEXT: s_lshr_b32 s1, s0, 12		; GFX9-NEXT: s_lshr_b32 s1, s0, 12
; GFX9-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX9-NEXT: s_bfe_i32 s5, s0, 0x40000
; GFX9-NEXT: v_mov_b32_e32 v6, s7		; GFX9-NEXT: v_mov_b32_e32 v6, s6
; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s4
; GFX9-NEXT: s_bfe_i32 s8, s0, 0x40004		; GFX9-NEXT: s_bfe_i32 s7, s0, 0x40004
; GFX9-NEXT: s_bfe_i32 s10, s0, 0x40008		; GFX9-NEXT: s_bfe_i32 s9, s0, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v3, s11		; GFX9-NEXT: v_mov_b32_e32 v3, s10
; GFX9-NEXT: v_mov_b32_e32 v7, s9		; GFX9-NEXT: v_mov_b32_e32 v7, s8
; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-NEXT: v_mul_i32_i24_e32 v3, s10, v3		; GFX9-NEXT: v_mul_i32_i24_e32 v3, s9, v3
; GFX9-NEXT: s_bfe_i32 s13, s6, 0x40010		; GFX9-NEXT: s_bfe_i32 s12, s3, 0x40010
; GFX9-NEXT: v_and_b32_e32 v4, s2, v4		; GFX9-NEXT: v_and_b32_e32 v4, s2, v4
; GFX9-NEXT: v_and_b32_e32 v5, s2, v5		; GFX9-NEXT: v_and_b32_e32 v5, s2, v5
; GFX9-NEXT: s_bfe_i32 s15, s6, 0x40014		; GFX9-NEXT: s_bfe_i32 s14, s3, 0x40014
; GFX9-NEXT: s_bfe_i32 s12, s0, 0x40010		; GFX9-NEXT: s_bfe_i32 s11, s0, 0x40010
; GFX9-NEXT: v_mov_b32_e32 v8, s13		; GFX9-NEXT: v_mov_b32_e32 v8, s12
; GFX9-NEXT: s_bfe_i32 s17, s6, 0x40018		; GFX9-NEXT: s_bfe_i32 s16, s3, 0x40018
; GFX9-NEXT: s_bfe_i32 s14, s0, 0x40014		; GFX9-NEXT: s_bfe_i32 s13, s0, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v9, s15		; GFX9-NEXT: v_mov_b32_e32 v9, s14
; GFX9-NEXT: s_bfe_i32 s16, s0, 0x40018		; GFX9-NEXT: s_bfe_i32 s15, s0, 0x40018
; GFX9-NEXT: s_ashr_i32 s6, s6, 28		; GFX9-NEXT: s_ashr_i32 s3, s3, 28
; GFX9-NEXT: v_mov_b32_e32 v10, s17		; GFX9-NEXT: v_mov_b32_e32 v10, s16
; GFX9-NEXT: s_ashr_i32 s0, s0, 28		; GFX9-NEXT: s_ashr_i32 s0, s0, 28
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s5, v6, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s8, v7, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s7, v7, v2
; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0		; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
; GFX9-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s12, v8, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s11, v8, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s14, v9, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s13, v9, v2
; GFX9-NEXT: v_mad_i32_i24 v2, s16, v10, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s15, v10, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s6		; GFX9-NEXT: v_mov_b32_e32 v3, s3
; GFX9-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX9-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc8:		; GFX9-DL-LABEL: idot8_acc8:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_movk_i32 s2, 0xff		; GFX9-DL-NEXT: s_movk_i32 s2, 0xff
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_i32 s7, s6, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s6, s3, 0x40000
; GFX9-DL-NEXT: s_lshr_b32 s4, s6, 12		; GFX9-DL-NEXT: s_lshr_b32 s4, s3, 12
; GFX9-DL-NEXT: s_bfe_i32 s9, s6, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s8, s3, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s11, s6, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s10, s3, 0x40008
; GFX9-DL-NEXT: s_lshr_b32 s1, s0, 12		; GFX9-DL-NEXT: s_lshr_b32 s1, s0, 12
; GFX9-DL-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s5, s0, 0x40000
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s7		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s6
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s4		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s4
; GFX9-DL-NEXT: s_bfe_i32 s8, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s7, s0, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s10, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s9, s0, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s9		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s8
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-DL-NEXT: v_mul_i32_i24_e32 v3, s10, v3		; GFX9-DL-NEXT: v_mul_i32_i24_e32 v3, s9, v3
; GFX9-DL-NEXT: s_bfe_i32 s13, s6, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s12, s3, 0x40010
; GFX9-DL-NEXT: v_and_b32_e32 v4, s2, v4		; GFX9-DL-NEXT: v_and_b32_e32 v4, s2, v4
; GFX9-DL-NEXT: v_and_b32_e32 v5, s2, v5		; GFX9-DL-NEXT: v_and_b32_e32 v5, s2, v5
; GFX9-DL-NEXT: s_bfe_i32 s15, s6, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s14, s3, 0x40014
; GFX9-DL-NEXT: s_bfe_i32 s12, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s11, s0, 0x40010
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12
; GFX9-DL-NEXT: s_bfe_i32 s17, s6, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s16, s3, 0x40018
; GFX9-DL-NEXT: s_bfe_i32 s14, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s13, s0, 0x40014
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s14
; GFX9-DL-NEXT: s_bfe_i32 s16, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s15, s0, 0x40018
; GFX9-DL-NEXT: s_ashr_i32 s6, s6, 28		; GFX9-DL-NEXT: s_ashr_i32 s3, s3, 28
; GFX9-DL-NEXT: v_mov_b32_e32 v10, s17		; GFX9-DL-NEXT: v_mov_b32_e32 v10, s16
; GFX9-DL-NEXT: s_ashr_i32 s0, s0, 28		; GFX9-DL-NEXT: s_ashr_i32 s0, s0, 28
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s5, v6, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s5, v6, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s8, v7, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s7, v7, v2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
; GFX9-DL-NEXT: v_mad_u32_u24 v2, v4, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, v4, v5, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s12, v8, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s11, v8, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s14, v9, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s13, v9, v2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s16, v10, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s15, v10, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s3
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s0, v3, v2		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s0, v3, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc8:		; GFX10-DL-LABEL: idot8_acc8:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 12		; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 12
; GFX10-DL-NEXT: s_lshr_b32 s4, s1, 12		; GFX10-DL-NEXT: s_lshr_b32 s3, s1, 12
; GFX10-DL-NEXT: s_bfe_i32 s5, s0, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s6, s1, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s5, s1, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s7, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s6, s0, 0x40004
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s2		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s2
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s4		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s3
; GFX10-DL-NEXT: s_bfe_i32 s8, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s7, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s9, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s8, s1, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s2, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s2, s1, 0x40004
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3
; GFX10-DL-NEXT: s_movk_i32 s4, 0xff		; GFX10-DL-NEXT: s_movk_i32 s3, 0xff
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4
; GFX10-DL-NEXT: v_mul_i32_i24_e64 v5, s8, s9		; GFX10-DL-NEXT: v_mul_i32_i24_e64 v5, s7, s8
; GFX10-DL-NEXT: v_and_b32_e32 v3, s4, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s3, v3
; GFX10-DL-NEXT: v_and_b32_e32 v4, s4, v4		; GFX10-DL-NEXT: v_and_b32_e32 v4, s3, v4
; GFX10-DL-NEXT: s_bfe_i32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_i32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_i32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s7, s2, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s6, s2, v2
; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40010
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
; GFX10-DL-NEXT: v_mad_u32_u24 v2, v3, v4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, v3, v4, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_i32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_ashr_i32 s0, s0, 28		; GFX10-DL-NEXT: s_ashr_i32 s0, s0, 28
; GFX10-DL-NEXT: s_ashr_i32 s1, s1, 28		; GFX10-DL-NEXT: s_ashr_i32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s0, s1, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i8 addrspace(1)* nocapture %dst) {		i8 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; Make sure the pattern is not recognized if there are multiple uses of the		; Make sure the pattern is not recognized if there are multiple uses of the
; intermediate multiplications.		; intermediate multiplications.
define amdgpu_kernel void @idot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_multiuses_mul1:		; GFX7-LABEL: idot8_multiuses_mul1:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s2, s0, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s7, s5, 0x40000
; GFX7-NEXT: v_mov_b32_e32 v0, s8		; GFX7-NEXT: v_mov_b32_e32 v0, s7
; GFX7-NEXT: v_mov_b32_e32 v1, s21		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: v_mad_i32_i24 v1, s2, v0, v1		; GFX7-NEXT: v_mad_i32_i24 v1, s6, v0, v1
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s9, s5, 0x40004
; GFX7-NEXT: s_bfe_i32 s9, s0, 0x40004		; GFX7-NEXT: s_bfe_i32 s8, s4, 0x40004
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008		; GFX7-NEXT: s_bfe_i32 s11, s5, 0x40008
; GFX7-NEXT: v_mad_i32_i24 v0, s2, v0, v1		; GFX7-NEXT: v_mad_i32_i24 v0, s6, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s10		; GFX7-NEXT: v_mov_b32_e32 v2, s9
; GFX7-NEXT: v_mad_i32_i24 v0, s9, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s8, v2, v0
; GFX7-NEXT: s_bfe_i32 s11, s0, 0x40008		; GFX7-NEXT: s_bfe_i32 s10, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v2, s12		; GFX7-NEXT: v_mov_b32_e32 v2, s11
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x4000c		; GFX7-NEXT: s_bfe_i32 s13, s5, 0x4000c
; GFX7-NEXT: v_mad_i32_i24 v0, s11, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s10, v2, v0
; GFX7-NEXT: s_bfe_i32 s13, s0, 0x4000c		; GFX7-NEXT: s_bfe_i32 s12, s4, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v2, s14		; GFX7-NEXT: v_mov_b32_e32 v2, s13
; GFX7-NEXT: s_bfe_i32 s16, s1, 0x40010		; GFX7-NEXT: s_bfe_i32 s15, s5, 0x40010
; GFX7-NEXT: v_mad_i32_i24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s12, v2, v0
; GFX7-NEXT: s_bfe_i32 s15, s0, 0x40010		; GFX7-NEXT: s_bfe_i32 s14, s4, 0x40010
; GFX7-NEXT: v_mov_b32_e32 v2, s16		; GFX7-NEXT: v_mov_b32_e32 v2, s15
; GFX7-NEXT: s_bfe_i32 s18, s1, 0x40014		; GFX7-NEXT: s_bfe_i32 s17, s5, 0x40014
; GFX7-NEXT: s_bfe_i32 s20, s1, 0x40018		; GFX7-NEXT: s_bfe_i32 s19, s5, 0x40018
; GFX7-NEXT: v_mad_i32_i24 v0, s15, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s14, v2, v0
; GFX7-NEXT: s_bfe_i32 s17, s0, 0x40014		; GFX7-NEXT: s_bfe_i32 s16, s4, 0x40014
; GFX7-NEXT: v_mov_b32_e32 v2, s18		; GFX7-NEXT: v_mov_b32_e32 v2, s17
; GFX7-NEXT: s_bfe_i32 s19, s0, 0x40018		; GFX7-NEXT: s_bfe_i32 s18, s4, 0x40018
; GFX7-NEXT: v_mad_i32_i24 v0, s17, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s16, v2, v0
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: s_ashr_i32 s1, s1, 28		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
; GFX7-NEXT: v_mad_i32_i24 v0, s19, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s18, v2, v0
; GFX7-NEXT: s_ashr_i32 s0, s0, 28		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
; GFX7-NEXT: v_mov_b32_e32 v2, s1		; GFX7-NEXT: v_mov_b32_e32 v2, s5
; GFX7-NEXT: v_mad_i32_i24 v0, s0, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s4, v2, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_multiuses_mul1:		; GFX8-LABEL: idot8_multiuses_mul1:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX8-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX8-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX8-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s5
; GFX8-NEXT: v_mov_b32_e32 v1, s19		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: v_mad_i32_i24 v1, s5, v0, v1		; GFX8-NEXT: v_mad_i32_i24 v1, s4, v0, v1
; GFX8-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX8-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX8-NEXT: s_bfe_i32 s7, s2, 0x40004		; GFX8-NEXT: s_bfe_i32 s6, s2, 0x40004
; GFX8-NEXT: s_bfe_i32 s10, s4, 0x40008		; GFX8-NEXT: s_bfe_i32 s9, s3, 0x40008
; GFX8-NEXT: v_mad_i32_i24 v0, s5, v0, v1		; GFX8-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s8		; GFX8-NEXT: v_mov_b32_e32 v2, s7
; GFX8-NEXT: v_mad_i32_i24 v0, s7, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s6, v2, v0
; GFX8-NEXT: s_bfe_i32 s9, s2, 0x40008		; GFX8-NEXT: s_bfe_i32 s8, s2, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v2, s10		; GFX8-NEXT: v_mov_b32_e32 v2, s9
; GFX8-NEXT: s_bfe_i32 s12, s4, 0x4000c		; GFX8-NEXT: s_bfe_i32 s11, s3, 0x4000c
; GFX8-NEXT: v_mad_i32_i24 v0, s9, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s8, v2, v0
; GFX8-NEXT: s_bfe_i32 s11, s2, 0x4000c		; GFX8-NEXT: s_bfe_i32 s10, s2, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v2, s12		; GFX8-NEXT: v_mov_b32_e32 v2, s11
; GFX8-NEXT: s_bfe_i32 s14, s4, 0x40010		; GFX8-NEXT: s_bfe_i32 s13, s3, 0x40010
; GFX8-NEXT: v_mad_i32_i24 v0, s11, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s10, v2, v0
; GFX8-NEXT: s_bfe_i32 s13, s2, 0x40010		; GFX8-NEXT: s_bfe_i32 s12, s2, 0x40010
; GFX8-NEXT: v_mov_b32_e32 v2, s14		; GFX8-NEXT: v_mov_b32_e32 v2, s13
; GFX8-NEXT: s_bfe_i32 s16, s4, 0x40014		; GFX8-NEXT: s_bfe_i32 s15, s3, 0x40014
; GFX8-NEXT: s_bfe_i32 s18, s4, 0x40018		; GFX8-NEXT: s_bfe_i32 s17, s3, 0x40018
; GFX8-NEXT: v_mad_i32_i24 v0, s13, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s12, v2, v0
; GFX8-NEXT: s_bfe_i32 s15, s2, 0x40014		; GFX8-NEXT: s_bfe_i32 s14, s2, 0x40014
; GFX8-NEXT: v_mov_b32_e32 v2, s16		; GFX8-NEXT: v_mov_b32_e32 v2, s15
; GFX8-NEXT: s_bfe_i32 s17, s2, 0x40018		; GFX8-NEXT: s_bfe_i32 s16, s2, 0x40018
; GFX8-NEXT: v_mad_i32_i24 v0, s15, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s14, v2, v0
; GFX8-NEXT: v_mov_b32_e32 v2, s18		; GFX8-NEXT: v_mov_b32_e32 v2, s17
; GFX8-NEXT: s_ashr_i32 s4, s4, 28		; GFX8-NEXT: s_ashr_i32 s3, s3, 28
; GFX8-NEXT: v_mad_i32_i24 v0, s17, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s16, v2, v0
; GFX8-NEXT: s_ashr_i32 s2, s2, 28		; GFX8-NEXT: s_ashr_i32 s2, s2, 28
; GFX8-NEXT: v_mov_b32_e32 v2, s4		; GFX8-NEXT: v_mov_b32_e32 v2, s3
; GFX8-NEXT: v_mad_i32_i24 v0, s2, v2, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s2, v2, v0
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v0, v1		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_multiuses_mul1:		; GFX9-LABEL: idot8_multiuses_mul1:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX9-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX9-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX9-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s5
; GFX9-NEXT: v_mov_b32_e32 v1, s19		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: v_mad_i32_i24 v1, s5, v0, v1		; GFX9-NEXT: v_mad_i32_i24 v1, s4, v0, v1
; GFX9-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX9-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX9-NEXT: s_bfe_i32 s7, s2, 0x40004		; GFX9-NEXT: s_bfe_i32 s6, s2, 0x40004
; GFX9-NEXT: s_bfe_i32 s10, s4, 0x40008		; GFX9-NEXT: s_bfe_i32 s9, s3, 0x40008
; GFX9-NEXT: v_mad_i32_i24 v0, s5, v0, v1		; GFX9-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s8		; GFX9-NEXT: v_mov_b32_e32 v2, s7
; GFX9-NEXT: v_mad_i32_i24 v0, s7, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s6, v2, v0
; GFX9-NEXT: s_bfe_i32 s9, s2, 0x40008		; GFX9-NEXT: s_bfe_i32 s8, s2, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v2, s10		; GFX9-NEXT: v_mov_b32_e32 v2, s9
; GFX9-NEXT: s_bfe_i32 s12, s4, 0x4000c		; GFX9-NEXT: s_bfe_i32 s11, s3, 0x4000c
; GFX9-NEXT: v_mad_i32_i24 v0, s9, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s8, v2, v0
; GFX9-NEXT: s_bfe_i32 s11, s2, 0x4000c		; GFX9-NEXT: s_bfe_i32 s10, s2, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v2, s12		; GFX9-NEXT: v_mov_b32_e32 v2, s11
; GFX9-NEXT: s_bfe_i32 s14, s4, 0x40010		; GFX9-NEXT: s_bfe_i32 s13, s3, 0x40010
; GFX9-NEXT: v_mad_i32_i24 v0, s11, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s10, v2, v0
; GFX9-NEXT: s_bfe_i32 s13, s2, 0x40010		; GFX9-NEXT: s_bfe_i32 s12, s2, 0x40010
; GFX9-NEXT: v_mov_b32_e32 v2, s14		; GFX9-NEXT: v_mov_b32_e32 v2, s13
; GFX9-NEXT: s_bfe_i32 s16, s4, 0x40014		; GFX9-NEXT: s_bfe_i32 s15, s3, 0x40014
; GFX9-NEXT: s_bfe_i32 s18, s4, 0x40018		; GFX9-NEXT: s_bfe_i32 s17, s3, 0x40018
; GFX9-NEXT: v_mad_i32_i24 v0, s13, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s12, v2, v0
; GFX9-NEXT: s_bfe_i32 s15, s2, 0x40014		; GFX9-NEXT: s_bfe_i32 s14, s2, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v2, s16		; GFX9-NEXT: v_mov_b32_e32 v2, s15
; GFX9-NEXT: s_bfe_i32 s17, s2, 0x40018		; GFX9-NEXT: s_bfe_i32 s16, s2, 0x40018
; GFX9-NEXT: v_mad_i32_i24 v0, s15, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s14, v2, v0
; GFX9-NEXT: v_mov_b32_e32 v2, s18		; GFX9-NEXT: v_mov_b32_e32 v2, s17
; GFX9-NEXT: s_ashr_i32 s4, s4, 28		; GFX9-NEXT: s_ashr_i32 s3, s3, 28
; GFX9-NEXT: v_mad_i32_i24 v0, s17, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s16, v2, v0
; GFX9-NEXT: s_ashr_i32 s2, s2, 28		; GFX9-NEXT: s_ashr_i32 s2, s2, 28
; GFX9-NEXT: v_mov_b32_e32 v2, s4		; GFX9-NEXT: v_mov_b32_e32 v2, s3
; GFX9-NEXT: v_mad_i32_i24 v0, s2, v2, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s2, v2, v0
; GFX9-NEXT: v_add_u32_e32 v2, v1, v0		; GFX9-NEXT: v_add_u32_e32 v2, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_multiuses_mul1:		; GFX9-DL-LABEL: idot8_multiuses_mul1:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_i32 s5, s2, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s4, s2, 0x40000
; GFX9-DL-NEXT: s_bfe_i32 s6, s4, 0x40000		; GFX9-DL-NEXT: s_bfe_i32 s5, s3, 0x40000
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s5
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s19		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s18
; GFX9-DL-NEXT: v_mad_i32_i24 v1, s5, v0, v1		; GFX9-DL-NEXT: v_mad_i32_i24 v1, s4, v0, v1
; GFX9-DL-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s7, s2, 0x40004		; GFX9-DL-NEXT: s_bfe_i32 s6, s2, 0x40004
; GFX9-DL-NEXT: s_bfe_i32 s10, s4, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s9, s3, 0x40008
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s5, v0, v1		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s8		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s7
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s7, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s6, v2, v0
; GFX9-DL-NEXT: s_bfe_i32 s9, s2, 0x40008		; GFX9-DL-NEXT: s_bfe_i32 s8, s2, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s9
; GFX9-DL-NEXT: s_bfe_i32 s12, s4, 0x4000c		; GFX9-DL-NEXT: s_bfe_i32 s11, s3, 0x4000c
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s9, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s8, v2, v0
; GFX9-DL-NEXT: s_bfe_i32 s11, s2, 0x4000c		; GFX9-DL-NEXT: s_bfe_i32 s10, s2, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s11
; GFX9-DL-NEXT: s_bfe_i32 s14, s4, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s13, s3, 0x40010
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s11, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s10, v2, v0
; GFX9-DL-NEXT: s_bfe_i32 s13, s2, 0x40010		; GFX9-DL-NEXT: s_bfe_i32 s12, s2, 0x40010
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s13
; GFX9-DL-NEXT: s_bfe_i32 s16, s4, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s15, s3, 0x40014
; GFX9-DL-NEXT: s_bfe_i32 s18, s4, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s17, s3, 0x40018
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s13, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s12, v2, v0
; GFX9-DL-NEXT: s_bfe_i32 s15, s2, 0x40014		; GFX9-DL-NEXT: s_bfe_i32 s14, s2, 0x40014
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s15
; GFX9-DL-NEXT: s_bfe_i32 s17, s2, 0x40018		; GFX9-DL-NEXT: s_bfe_i32 s16, s2, 0x40018
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s15, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s14, v2, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s18		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s17
; GFX9-DL-NEXT: s_ashr_i32 s4, s4, 28		; GFX9-DL-NEXT: s_ashr_i32 s3, s3, 28
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s17, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s16, v2, v0
; GFX9-DL-NEXT: s_ashr_i32 s2, s2, 28		; GFX9-DL-NEXT: s_ashr_i32 s2, s2, 28
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s3
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s2, v2, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s2, v2, v0
; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0		; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_multiuses_mul1:		; GFX10-DL-LABEL: idot8_multiuses_mul1:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX10-DL-NEXT: s_load_dword s5, s[0:1], 0x0		; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_bfe_i32 s6, s2, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40000
; GFX10-DL-NEXT: s_bfe_i32 s7, s4, 0x40000		; GFX10-DL-NEXT: s_bfe_i32 s6, s3, 0x40000
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s4, s2, 0x40004
; GFX10-DL-NEXT: s_bfe_i32 s8, s4, 0x40004		; GFX10-DL-NEXT: s_bfe_i32 s7, s3, 0x40004
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s7, v0		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s5, s6, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s6, s7, v0		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s6, v0
; GFX10-DL-NEXT: s_bfe_i32 s6, s2, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40008
; GFX10-DL-NEXT: s_bfe_i32 s7, s4, 0x40008		; GFX10-DL-NEXT: s_bfe_i32 s6, s3, 0x40008
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s4, s7, v1
; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x4000c		; GFX10-DL-NEXT: s_bfe_i32 s4, s2, 0x4000c
; GFX10-DL-NEXT: s_bfe_i32 s8, s4, 0x4000c		; GFX10-DL-NEXT: s_bfe_i32 s7, s3, 0x4000c
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s6, s7, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s6, v1
; GFX10-DL-NEXT: s_bfe_i32 s6, s2, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40010
; GFX10-DL-NEXT: s_bfe_i32 s7, s4, 0x40010		; GFX10-DL-NEXT: s_bfe_i32 s6, s3, 0x40010
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s4, s7, v1
; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s4, s2, 0x40014
; GFX10-DL-NEXT: s_bfe_i32 s8, s4, 0x40014		; GFX10-DL-NEXT: s_bfe_i32 s7, s3, 0x40014
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s6, s7, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s6, v1
; GFX10-DL-NEXT: s_bfe_i32 s6, s2, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s5, s2, 0x40018
; GFX10-DL-NEXT: s_bfe_i32 s7, s4, 0x40018		; GFX10-DL-NEXT: s_bfe_i32 s6, s3, 0x40018
; GFX10-DL-NEXT: s_ashr_i32 s2, s2, 28		; GFX10-DL-NEXT: s_ashr_i32 s2, s2, 28
; GFX10-DL-NEXT: s_ashr_i32 s4, s4, 28		; GFX10-DL-NEXT: s_ashr_i32 s3, s3, 28
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s4, s7, v1
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s6, s7, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s5, s6, v1
; GFX10-DL-NEXT: v_mad_i32_i24 v1, s2, s4, v1		; GFX10-DL-NEXT: v_mad_i32_i24 v1, s2, s3, v1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v0, v1		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v0, v1
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	entry:
store i32 %res, i32 addrspace(1)* %dst, align 4		store i32 %res, i32 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Support this pattern.		; TODO: Support this pattern.
define amdgpu_kernel void @idot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc32_vecMul:		; GFX7-LABEL: idot8_acc32_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s5, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s9, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s7, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_ashr_i64 s[10:11], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[8:9], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s11, s1, 4		; GFX7-NEXT: s_lshl_b32 s9, s5, 4
; GFX7-NEXT: s_ashr_i64 s[16:17], s[10:11], 60		; GFX7-NEXT: s_ashr_i64 s[14:15], s[8:9], 60
; GFX7-NEXT: s_lshl_b32 s11, s1, 16		; GFX7-NEXT: s_lshl_b32 s9, s5, 16
; GFX7-NEXT: s_ashr_i64 s[18:19], s[10:11], 60		; GFX7-NEXT: s_ashr_i64 s[16:17], s[8:9], 60
; GFX7-NEXT: s_lshl_b32 s11, s1, 20		; GFX7-NEXT: s_lshl_b32 s9, s5, 20
; GFX7-NEXT: s_lshl_b32 s13, s1, 8		; GFX7-NEXT: s_lshl_b32 s11, s5, 8
; GFX7-NEXT: s_lshl_b32 s15, s1, 12		; GFX7-NEXT: s_lshl_b32 s13, s5, 12
; GFX7-NEXT: s_ashr_i64 s[20:21], s[10:11], 60		; GFX7-NEXT: s_ashr_i64 s[18:19], s[8:9], 60
; GFX7-NEXT: s_lshl_b32 s11, s1, 24		; GFX7-NEXT: s_lshl_b32 s9, s5, 24
; GFX7-NEXT: s_lshl_b32 s1, s1, 28		; GFX7-NEXT: s_lshl_b32 s5, s5, 28
; GFX7-NEXT: s_ashr_i64 s[0:1], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[4:5], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 4		; GFX7-NEXT: s_lshl_b32 s5, s7, 4
; GFX7-NEXT: s_ashr_i64 s[26:27], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[24:25], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 8		; GFX7-NEXT: s_lshl_b32 s5, s7, 8
; GFX7-NEXT: s_ashr_i64 s[28:29], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[26:27], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 12		; GFX7-NEXT: s_lshl_b32 s5, s7, 12
; GFX7-NEXT: s_ashr_i64 s[30:31], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[28:29], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 16		; GFX7-NEXT: s_lshl_b32 s5, s7, 16
; GFX7-NEXT: s_ashr_i64 s[32:33], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[30:31], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 20		; GFX7-NEXT: s_lshl_b32 s5, s7, 20
; GFX7-NEXT: s_ashr_i64 s[34:35], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[34:35], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 24		; GFX7-NEXT: s_lshl_b32 s5, s7, 24
; GFX7-NEXT: s_ashr_i64 s[36:37], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[36:37], s[4:5], 60
; GFX7-NEXT: s_lshl_b32 s1, s9, 28		; GFX7-NEXT: s_lshl_b32 s5, s7, 28
; GFX7-NEXT: s_ashr_i64 s[24:25], s[8:9], 60		; GFX7-NEXT: s_ashr_i64 s[22:23], s[6:7], 60
; GFX7-NEXT: s_ashr_i64 s[8:9], s[0:1], 60		; GFX7-NEXT: s_ashr_i64 s[6:7], s[4:5], 60
; GFX7-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s5, s[0:1], 0x0
; GFX7-NEXT: v_mov_b32_e32 v0, s8		; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: s_ashr_i64 s[22:23], s[10:11], 60		; GFX7-NEXT: s_ashr_i64 s[20:21], s[8:9], 60
; GFX7-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
; GFX7-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX7-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
		; GFX7-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mad_i32_i24 v0, s0, v0, v1		; GFX7-NEXT: v_mad_i32_i24 v0, s4, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s36		; GFX7-NEXT: v_mov_b32_e32 v1, s36
; GFX7-NEXT: v_mad_i32_i24 v0, s22, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s34
; GFX7-NEXT: v_mad_i32_i24 v0, s20, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s20, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s32		; GFX7-NEXT: v_mov_b32_e32 v1, s34
; GFX7-NEXT: v_mad_i32_i24 v0, s18, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s18, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s30		; GFX7-NEXT: v_mov_b32_e32 v1, s30
; GFX7-NEXT: v_mad_i32_i24 v0, s14, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s28		; GFX7-NEXT: v_mov_b32_e32 v1, s28
; GFX7-NEXT: v_mad_i32_i24 v0, s12, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s12, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s26		; GFX7-NEXT: v_mov_b32_e32 v1, s26
; GFX7-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s24
; GFX7-NEXT: v_mad_i32_i24 v0, s10, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s10, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: v_mov_b32_e32 v1, s24
		; GFX7-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX7-NEXT: v_mov_b32_e32 v1, s22
		; GFX7-NEXT: v_mad_i32_i24 v0, s8, v1, v0
		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc32_vecMul:		; GFX8-LABEL: idot8_acc32_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s5, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_ashr_i64 s[8:9], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[6:7], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s9, s5, 4		; GFX8-NEXT: s_lshl_b32 s7, s3, 4
; GFX8-NEXT: s_ashr_i64 s[16:17], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[14:15], s[6:7], 60
; GFX8-NEXT: s_lshl_b32 s9, s5, 20		; GFX8-NEXT: s_lshl_b32 s7, s3, 20
; GFX8-NEXT: s_lshl_b32 s11, s5, 8		; GFX8-NEXT: s_lshl_b32 s9, s3, 8
; GFX8-NEXT: s_lshl_b32 s13, s5, 12		; GFX8-NEXT: s_lshl_b32 s11, s3, 12
; GFX8-NEXT: s_lshl_b32 s15, s5, 16		; GFX8-NEXT: s_lshl_b32 s13, s3, 16
; GFX8-NEXT: s_ashr_i64 s[18:19], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[16:17], s[6:7], 60
; GFX8-NEXT: s_lshl_b32 s9, s5, 24		; GFX8-NEXT: s_lshl_b32 s7, s3, 24
; GFX8-NEXT: s_lshl_b32 s5, s5, 28		; GFX8-NEXT: s_lshl_b32 s3, s3, 28
; GFX8-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 4		; GFX8-NEXT: s_lshl_b32 s3, s5, 4
; GFX8-NEXT: s_ashr_i64 s[24:25], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[22:23], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 8		; GFX8-NEXT: s_lshl_b32 s3, s5, 8
; GFX8-NEXT: s_ashr_i64 s[26:27], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[24:25], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 12		; GFX8-NEXT: s_lshl_b32 s3, s5, 12
; GFX8-NEXT: s_ashr_i64 s[28:29], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[26:27], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 16		; GFX8-NEXT: s_lshl_b32 s3, s5, 16
; GFX8-NEXT: s_ashr_i64 s[30:31], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[28:29], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 20		; GFX8-NEXT: s_lshl_b32 s3, s5, 20
; GFX8-NEXT: s_ashr_i64 s[32:33], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[30:31], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 24		; GFX8-NEXT: s_lshl_b32 s3, s5, 24
; GFX8-NEXT: s_ashr_i64 s[34:35], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[34:35], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s5, s7, 28		; GFX8-NEXT: s_lshl_b32 s3, s5, 28
; GFX8-NEXT: s_ashr_i64 s[22:23], s[6:7], 60		; GFX8-NEXT: s_ashr_i64 s[20:21], s[4:5], 60
; GFX8-NEXT: s_ashr_i64 s[6:7], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[4:5], s[2:3], 60
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX8-NEXT: v_mov_b32_e32 v1, s2		; GFX8-NEXT: v_mov_b32_e32 v0, s4
; GFX8-NEXT: v_mad_i32_i24 v0, s4, v0, v1		; GFX8-NEXT: s_ashr_i64 s[18:19], s[6:7], 60
; GFX8-NEXT: s_ashr_i64 s[20:21], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
		; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
		; GFX8-NEXT: v_mov_b32_e32 v1, s3
		; GFX8-NEXT: v_mad_i32_i24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s34		; GFX8-NEXT: v_mov_b32_e32 v1, s34
; GFX8-NEXT: v_mad_i32_i24 v0, s20, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s32
; GFX8-NEXT: v_mad_i32_i24 v0, s18, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s18, v1, v0
; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s30		; GFX8-NEXT: v_mov_b32_e32 v1, s30
; GFX8-NEXT: v_mad_i32_i24 v0, s14, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s28		; GFX8-NEXT: v_mov_b32_e32 v1, s28
; GFX8-NEXT: v_mad_i32_i24 v0, s12, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s12, v1, v0
; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s26		; GFX8-NEXT: v_mov_b32_e32 v1, s26
; GFX8-NEXT: v_mad_i32_i24 v0, s10, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX8-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX8-NEXT: v_mov_b32_e32 v1, s24		; GFX8-NEXT: v_mov_b32_e32 v1, s24
; GFX8-NEXT: v_mad_i32_i24 v0, s16, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s8, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s22		; GFX8-NEXT: v_mov_b32_e32 v1, s22
; GFX8-NEXT: v_mad_i32_i24 v2, s8, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX8-NEXT: v_mov_b32_e32 v1, s20
		; GFX8-NEXT: v_mad_i32_i24 v2, s6, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc32_vecMul:		; GFX9-LABEL: idot8_acc32_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s5, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_ashr_i64 s[8:9], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[6:7], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s9, s5, 4		; GFX9-NEXT: s_lshl_b32 s7, s3, 4
; GFX9-NEXT: s_ashr_i64 s[16:17], s[8:9], 60		; GFX9-NEXT: s_ashr_i64 s[14:15], s[6:7], 60
; GFX9-NEXT: s_lshl_b32 s9, s5, 20		; GFX9-NEXT: s_lshl_b32 s7, s3, 20
; GFX9-NEXT: s_lshl_b32 s11, s5, 8		; GFX9-NEXT: s_lshl_b32 s9, s3, 8
; GFX9-NEXT: s_lshl_b32 s13, s5, 12		; GFX9-NEXT: s_lshl_b32 s11, s3, 12
; GFX9-NEXT: s_lshl_b32 s15, s5, 16		; GFX9-NEXT: s_lshl_b32 s13, s3, 16
; GFX9-NEXT: s_ashr_i64 s[18:19], s[8:9], 60		; GFX9-NEXT: s_ashr_i64 s[16:17], s[6:7], 60
; GFX9-NEXT: s_lshl_b32 s9, s5, 24		; GFX9-NEXT: s_lshl_b32 s7, s3, 24
; GFX9-NEXT: s_lshl_b32 s5, s5, 28		; GFX9-NEXT: s_lshl_b32 s3, s3, 28
; GFX9-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 4		; GFX9-NEXT: s_lshl_b32 s3, s5, 4
; GFX9-NEXT: s_ashr_i64 s[24:25], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[22:23], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 8		; GFX9-NEXT: s_lshl_b32 s3, s5, 8
; GFX9-NEXT: s_ashr_i64 s[26:27], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[24:25], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 12		; GFX9-NEXT: s_lshl_b32 s3, s5, 12
; GFX9-NEXT: s_ashr_i64 s[28:29], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[26:27], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 16		; GFX9-NEXT: s_lshl_b32 s3, s5, 16
; GFX9-NEXT: s_ashr_i64 s[30:31], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[28:29], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 20		; GFX9-NEXT: s_lshl_b32 s3, s5, 20
; GFX9-NEXT: s_ashr_i64 s[32:33], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[30:31], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 24		; GFX9-NEXT: s_lshl_b32 s3, s5, 24
; GFX9-NEXT: s_ashr_i64 s[34:35], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[34:35], s[2:3], 60
; GFX9-NEXT: s_lshl_b32 s5, s7, 28		; GFX9-NEXT: s_lshl_b32 s3, s5, 28
; GFX9-NEXT: s_ashr_i64 s[22:23], s[6:7], 60		; GFX9-NEXT: s_ashr_i64 s[20:21], s[4:5], 60
; GFX9-NEXT: s_ashr_i64 s[6:7], s[4:5], 60		; GFX9-NEXT: s_ashr_i64 s[4:5], s[2:3], 60
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-NEXT: v_mov_b32_e32 v1, s2		; GFX9-NEXT: v_mov_b32_e32 v0, s4
; GFX9-NEXT: v_mad_i32_i24 v0, s4, v0, v1		; GFX9-NEXT: s_ashr_i64 s[18:19], s[6:7], 60
; GFX9-NEXT: s_ashr_i64 s[20:21], s[8:9], 60		; GFX9-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
		; GFX9-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-NEXT: v_mov_b32_e32 v1, s3
		; GFX9-NEXT: v_mad_i32_i24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s34		; GFX9-NEXT: v_mov_b32_e32 v1, s34
; GFX9-NEXT: v_mad_i32_i24 v0, s20, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s32
; GFX9-NEXT: v_mad_i32_i24 v0, s18, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s18, v1, v0
; GFX9-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s30		; GFX9-NEXT: v_mov_b32_e32 v1, s30
; GFX9-NEXT: v_mad_i32_i24 v0, s14, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX9-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s28		; GFX9-NEXT: v_mov_b32_e32 v1, s28
; GFX9-NEXT: v_mad_i32_i24 v0, s12, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s12, v1, v0
; GFX9-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s26		; GFX9-NEXT: v_mov_b32_e32 v1, s26
; GFX9-NEXT: v_mad_i32_i24 v0, s10, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX9-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX9-NEXT: v_mov_b32_e32 v1, s24		; GFX9-NEXT: v_mov_b32_e32 v1, s24
; GFX9-NEXT: v_mad_i32_i24 v0, s16, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s8, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s22		; GFX9-NEXT: v_mov_b32_e32 v1, s22
; GFX9-NEXT: v_mad_i32_i24 v2, s8, v1, v0		; GFX9-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX9-NEXT: v_mov_b32_e32 v1, s20
		; GFX9-NEXT: v_mad_i32_i24 v2, s6, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc32_vecMul:		; GFX9-DL-LABEL: idot8_acc32_vecMul:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s5, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_ashr_i64 s[8:9], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[6:7], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s9, s5, 4		; GFX9-DL-NEXT: s_lshl_b32 s7, s3, 4
; GFX9-DL-NEXT: s_ashr_i64 s[16:17], s[8:9], 60		; GFX9-DL-NEXT: s_ashr_i64 s[14:15], s[6:7], 60
; GFX9-DL-NEXT: s_lshl_b32 s9, s5, 20		; GFX9-DL-NEXT: s_lshl_b32 s7, s3, 20
; GFX9-DL-NEXT: s_lshl_b32 s11, s5, 8		; GFX9-DL-NEXT: s_lshl_b32 s9, s3, 8
; GFX9-DL-NEXT: s_lshl_b32 s13, s5, 12		; GFX9-DL-NEXT: s_lshl_b32 s11, s3, 12
; GFX9-DL-NEXT: s_lshl_b32 s15, s5, 16		; GFX9-DL-NEXT: s_lshl_b32 s13, s3, 16
; GFX9-DL-NEXT: s_ashr_i64 s[18:19], s[8:9], 60		; GFX9-DL-NEXT: s_ashr_i64 s[16:17], s[6:7], 60
; GFX9-DL-NEXT: s_lshl_b32 s9, s5, 24		; GFX9-DL-NEXT: s_lshl_b32 s7, s3, 24
; GFX9-DL-NEXT: s_lshl_b32 s5, s5, 28		; GFX9-DL-NEXT: s_lshl_b32 s3, s3, 28
; GFX9-DL-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 4		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 4
; GFX9-DL-NEXT: s_ashr_i64 s[24:25], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[22:23], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 8		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 8
; GFX9-DL-NEXT: s_ashr_i64 s[26:27], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[24:25], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 12		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 12
; GFX9-DL-NEXT: s_ashr_i64 s[28:29], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[26:27], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 16		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 16
; GFX9-DL-NEXT: s_ashr_i64 s[30:31], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[28:29], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 20		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 20
; GFX9-DL-NEXT: s_ashr_i64 s[32:33], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[30:31], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 24		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 24
; GFX9-DL-NEXT: s_ashr_i64 s[34:35], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[34:35], s[2:3], 60
; GFX9-DL-NEXT: s_lshl_b32 s5, s7, 28		; GFX9-DL-NEXT: s_lshl_b32 s3, s5, 28
; GFX9-DL-NEXT: s_ashr_i64 s[22:23], s[6:7], 60		; GFX9-DL-NEXT: s_ashr_i64 s[20:21], s[4:5], 60
; GFX9-DL-NEXT: s_ashr_i64 s[6:7], s[4:5], 60		; GFX9-DL-NEXT: s_ashr_i64 s[4:5], s[2:3], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6		; GFX9-DL-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s2		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s4, v0, v1		; GFX9-DL-NEXT: s_ashr_i64 s[18:19], s[6:7], 60
; GFX9-DL-NEXT: s_ashr_i64 s[20:21], s[8:9], 60		; GFX9-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
		; GFX9-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s3
		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s2, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s34		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s34
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s20, v1, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s32
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s18, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s18, v1, v0
; GFX9-DL-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s30		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s30
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s14, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s16, v1, v0
; GFX9-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s28		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s28
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s12, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s12, v1, v0
; GFX9-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s26		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s26
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s10, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s10, v1, v0
		; GFX9-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s24		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s24
; GFX9-DL-NEXT: v_mad_i32_i24 v0, s16, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s8, v1, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s22		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s22
; GFX9-DL-NEXT: v_mad_i32_i24 v2, s8, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v0, s14, v1, v0
		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s20
		; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v1, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc32_vecMul:		; GFX10-DL-LABEL: idot8_acc32_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s5, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s3, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX10-DL-NEXT: s_load_dword s2, s[0:1], 0x0		; GFX10-DL-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
		; GFX10-DL-NEXT: s_lshl_b32 s7, s3, 28
; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 28		; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 28
; GFX10-DL-NEXT: s_lshl_b32 s11, s7, 28		; GFX10-DL-NEXT: s_lshl_b32 s11, s3, 24
; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 24		; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 24
; GFX10-DL-NEXT: s_lshl_b32 s15, s7, 24
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
		; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX10-DL-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX10-DL-NEXT: s_lshl_b32 s7, s3, 20
; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 20		; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 20
; GFX10-DL-NEXT: s_lshl_b32 s11, s7, 20		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s8, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s8, s10, v0		; GFX10-DL-NEXT: s_lshl_b32 s11, s3, 16
; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 16		; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 16
; GFX10-DL-NEXT: s_lshl_b32 s15, s7, 16		; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s10, s12, v0
; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s12, s14, v0
; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX10-DL-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX10-DL-NEXT: s_lshl_b32 s7, s3, 12
; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 12		; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 12
; GFX10-DL-NEXT: s_lshl_b32 s11, s7, 12		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s8, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s8, s10, v0		; GFX10-DL-NEXT: s_lshl_b32 s11, s3, 8
; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 8		; GFX10-DL-NEXT: s_lshl_b32 s13, s5, 8
; GFX10-DL-NEXT: s_lshl_b32 s15, s7, 8		; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s10, s12, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s12, s14, v0		; GFX10-DL-NEXT: s_lshl_b32 s7, s3, 4
; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 4		; GFX10-DL-NEXT: s_lshl_b32 s9, s5, 4
; GFX10-DL-NEXT: s_lshl_b32 s11, s7, 4		; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX10-DL-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX10-DL-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s8, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s8, s10, v0		; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX10-DL-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX10-DL-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX10-DL-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX10-DL-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX10-DL-NEXT: s_ashr_i64 s[4:5], s[4:5], 60
; GFX10-DL-NEXT: s_ashr_i64 s[6:7], s[6:7], 60		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s10, s12, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s12, s14, v0		; GFX10-DL-NEXT: v_mad_i32_i24 v0, s6, s8, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v0, s8, s10, v0		; GFX10-DL-NEXT: v_mad_i32_i24 v2, s2, s4, v0
; GFX10-DL-NEXT: v_mad_i32_i24 v2, s4, s6, v0
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
Show All 25 Lines	entry:
store i32 %add8, i32 addrspace(1)* %dst, align 4		store i32 %add8, i32 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Support this pattern.		; TODO: Support this pattern.
define amdgpu_kernel void @idot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc16_vecMul:		; GFX7-LABEL: idot8_acc16_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_mov_b32 s2, 0xffff		; GFX7-NEXT: s_mov_b32 s8, 0xffff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s0, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s16, s0, 0x40018		; GFX7-NEXT: s_bfe_i32 s15, s6, 0x40018
; GFX7-NEXT: s_bfe_i32 s17, s0, 0x40014		; GFX7-NEXT: s_bfe_i32 s16, s6, 0x40014
; GFX7-NEXT: s_bfe_i32 s18, s0, 0x40010		; GFX7-NEXT: s_bfe_i32 s17, s6, 0x40010
; GFX7-NEXT: s_bfe_i32 s19, s0, 0x40000		; GFX7-NEXT: s_bfe_i32 s18, s6, 0x40000
; GFX7-NEXT: s_bfe_i32 s20, s0, 0x40004		; GFX7-NEXT: s_bfe_i32 s19, s6, 0x40004
; GFX7-NEXT: s_bfe_i32 s21, s0, 0x40008		; GFX7-NEXT: s_bfe_i32 s20, s6, 0x40008
; GFX7-NEXT: s_ashr_i32 s15, s0, 28		; GFX7-NEXT: s_ashr_i32 s14, s6, 28
; GFX7-NEXT: s_bfe_i32 s0, s0, 0x4000c		; GFX7-NEXT: s_bfe_i32 s6, s6, 0x4000c
; GFX7-NEXT: s_ashr_i32 s8, s1, 28		; GFX7-NEXT: s_ashr_i32 s5, s4, 28
; GFX7-NEXT: s_bfe_i32 s9, s1, 0x40018		; GFX7-NEXT: s_bfe_i32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40014		; GFX7-NEXT: s_bfe_i32 s9, s4, 0x40014
; GFX7-NEXT: s_bfe_i32 s11, s1, 0x40010		; GFX7-NEXT: s_bfe_i32 s10, s4, 0x40010
; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40000		; GFX7-NEXT: s_bfe_i32 s11, s4, 0x40000
; GFX7-NEXT: v_mov_b32_e32 v4, s19		; GFX7-NEXT: v_mov_b32_e32 v4, s18
; GFX7-NEXT: s_bfe_i32 s13, s1, 0x40004		; GFX7-NEXT: s_bfe_i32 s12, s4, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v3, s20		; GFX7-NEXT: v_mov_b32_e32 v3, s19
; GFX7-NEXT: s_bfe_i32 s14, s1, 0x40008		; GFX7-NEXT: s_bfe_i32 s13, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v2, s21		; GFX7-NEXT: v_mov_b32_e32 v2, s20
; GFX7-NEXT: s_bfe_i32 s1, s1, 0x4000c		; GFX7-NEXT: s_bfe_i32 s4, s4, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v1, s0		; GFX7-NEXT: v_mov_b32_e32 v1, s6
; GFX7-NEXT: v_mul_i32_i24_e32 v1, s1, v1		; GFX7-NEXT: v_mul_i32_i24_e32 v1, s4, v1
; GFX7-NEXT: v_mul_i32_i24_e32 v2, s14, v2		; GFX7-NEXT: v_mul_i32_i24_e32 v2, s13, v2
; GFX7-NEXT: v_mul_i32_i24_e32 v3, s13, v3		; GFX7-NEXT: v_mul_i32_i24_e32 v3, s12, v3
; GFX7-NEXT: v_mul_i32_i24_e32 v4, s12, v4		; GFX7-NEXT: v_mul_i32_i24_e32 v4, s11, v4
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1		; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1
; GFX7-NEXT: v_and_b32_e32 v2, s2, v2		; GFX7-NEXT: v_and_b32_e32 v2, s8, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3		; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX7-NEXT: v_and_b32_e32 v4, s2, v4		; GFX7-NEXT: v_and_b32_e32 v4, s8, v4
; GFX7-NEXT: v_or_b32_e32 v1, v2, v1		; GFX7-NEXT: v_or_b32_e32 v1, v2, v1
; GFX7-NEXT: v_or_b32_e32 v2, v4, v3		; GFX7-NEXT: v_or_b32_e32 v2, v4, v3
; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 16		; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 16
; GFX7-NEXT: v_lshrrev_b32_e32 v4, 16, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v4, 16, v1
; GFX7-NEXT: v_mov_b32_e32 v5, s18		; GFX7-NEXT: v_mov_b32_e32 v5, s17
; GFX7-NEXT: v_mov_b32_e32 v6, s17		; GFX7-NEXT: v_mov_b32_e32 v6, s16
; GFX7-NEXT: v_mov_b32_e32 v7, s16		; GFX7-NEXT: v_mov_b32_e32 v7, s15
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0
; GFX7-NEXT: v_mad_i32_i24 v0, s11, v5, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s10, v5, v0
; GFX7-NEXT: v_mad_i32_i24 v0, s10, v6, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s9, v6, v0
; GFX7-NEXT: v_mad_i32_i24 v0, s9, v7, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s15		; GFX7-NEXT: v_mov_b32_e32 v1, s14
; GFX7-NEXT: v_mad_i32_i24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s5, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc16_vecMul:		; GFX8-LABEL: idot8_acc16_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s7, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ushort v2, v[0:1]		; GFX8-NEXT: flat_load_ushort v2, v[0:1]
; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshl_b32 s29, s7, 28		; GFX8-NEXT: s_lshl_b32 s27, s3, 28
; GFX8-NEXT: s_ashr_i64 s[18:19], s[6:7], 60		; GFX8-NEXT: s_ashr_i64 s[16:17], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s21, s7, 8		; GFX8-NEXT: s_lshl_b32 s19, s3, 8
; GFX8-NEXT: s_lshl_b32 s23, s7, 12		; GFX8-NEXT: s_lshl_b32 s21, s3, 12
; GFX8-NEXT: s_lshl_b32 s17, s1, 28		; GFX8-NEXT: s_lshl_b32 s15, s1, 28
; GFX8-NEXT: s_lshl_b32 s25, s7, 16		; GFX8-NEXT: s_lshl_b32 s23, s3, 16
; GFX8-NEXT: s_lshl_b32 s27, s7, 24		; GFX8-NEXT: s_lshl_b32 s25, s3, 24
; GFX8-NEXT: s_lshl_b32 s19, s7, 4		; GFX8-NEXT: s_lshl_b32 s17, s3, 4
; GFX8-NEXT: s_lshl_b32 s7, s7, 20		; GFX8-NEXT: s_lshl_b32 s3, s3, 20
; GFX8-NEXT: s_ashr_i64 s[4:5], s[0:1], 60		; GFX8-NEXT: s_ashr_i64 s[4:5], s[0:1], 60
; GFX8-NEXT: s_ashr_i64 s[28:29], s[28:29], 60		; GFX8-NEXT: s_ashr_i64 s[26:27], s[26:27], 60
; GFX8-NEXT: s_lshl_b32 s9, s1, 8		; GFX8-NEXT: s_lshl_b32 s7, s1, 8
; GFX8-NEXT: s_lshl_b32 s11, s1, 12		; GFX8-NEXT: s_lshl_b32 s9, s1, 12
; GFX8-NEXT: s_lshl_b32 s13, s1, 16		; GFX8-NEXT: s_lshl_b32 s11, s1, 16
; GFX8-NEXT: s_lshl_b32 s15, s1, 24		; GFX8-NEXT: s_lshl_b32 s13, s1, 24
; GFX8-NEXT: s_lshl_b32 s5, s1, 4		; GFX8-NEXT: s_lshl_b32 s5, s1, 4
; GFX8-NEXT: s_lshl_b32 s1, s1, 20		; GFX8-NEXT: s_lshl_b32 s1, s1, 20
; GFX8-NEXT: s_ashr_i64 s[26:27], s[26:27], 60		; GFX8-NEXT: s_ashr_i64 s[24:25], s[24:25], 60
; GFX8-NEXT: s_ashr_i64 s[6:7], s[6:7], 60		; GFX8-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX8-NEXT: s_ashr_i64 s[16:17], s[16:17], 60
; GFX8-NEXT: v_mov_b32_e32 v4, s28
; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
		; GFX8-NEXT: v_mov_b32_e32 v4, s26
		; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX8-NEXT: s_ashr_i64 s[0:1], s[0:1], 60		; GFX8-NEXT: s_ashr_i64 s[0:1], s[0:1], 60
; GFX8-NEXT: v_mov_b32_e32 v3, s6		; GFX8-NEXT: v_mov_b32_e32 v3, s2
; GFX8-NEXT: v_mov_b32_e32 v5, s26		; GFX8-NEXT: v_mov_b32_e32 v5, s24
; GFX8-NEXT: s_ashr_i64 s[24:25], s[24:25], 60
; GFX8-NEXT: v_mul_i32_i24_e32 v3, s0, v3
; GFX8-NEXT: s_ashr_i64 s[22:23], s[22:23], 60		; GFX8-NEXT: s_ashr_i64 s[22:23], s[22:23], 60
; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX8-NEXT: v_mul_i32_i24_e32 v3, s0, v3
; GFX8-NEXT: v_mov_b32_e32 v6, s24
; GFX8-NEXT: s_ashr_i64 s[20:21], s[20:21], 60		; GFX8-NEXT: s_ashr_i64 s[20:21], s[20:21], 60
; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60		; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX8-NEXT: v_mov_b32_e32 v7, s22		; GFX8-NEXT: v_mov_b32_e32 v6, s22
; GFX8-NEXT: s_ashr_i64 s[32:33], s[18:19], 60		; GFX8-NEXT: s_ashr_i64 s[18:19], s[18:19], 60
; GFX8-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[8:9], s[8:9], 60
; GFX8-NEXT: v_mov_b32_e32 v8, s20		; GFX8-NEXT: v_mov_b32_e32 v7, s20
; GFX8-NEXT: s_ashr_i64 s[30:31], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[30:31], s[16:17], 60
; GFX8-NEXT: v_mov_b32_e32 v9, s32		; GFX8-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
		; GFX8-NEXT: v_mov_b32_e32 v8, s18
		; GFX8-NEXT: s_ashr_i64 s[28:29], s[4:5], 60
		; GFX8-NEXT: v_mov_b32_e32 v9, s30
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_i32_i24 v2, s16, v4, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s14, v4, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s14, v5, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s12, v5, v2
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX8-NEXT: v_mad_i32_i24 v2, s12, v6, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s10, v6, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s10, v7, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s8, v7, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s8, v8, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s6, v8, v2
; GFX8-NEXT: v_mad_i32_i24 v2, s30, v9, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s28, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s18		; GFX8-NEXT: v_mov_b32_e32 v3, s16
; GFX8-NEXT: v_mad_i32_i24 v2, s4, v3, v2		; GFX8-NEXT: v_mad_i32_i24 v2, s4, v3, v2
; GFX8-NEXT: flat_store_short v[0:1], v2		; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: idot8_acc16_vecMul:		; GFX9-LABEL: idot8_acc16_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s2, 0x40018
; GFX9-NEXT: s_lshr_b32 s5, s2, 28		; GFX9-NEXT: s_lshr_b32 s4, s2, 28
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c
; GFX9-NEXT: s_and_b32 s12, s2, 15		; GFX9-NEXT: s_and_b32 s11, s2, 15
; GFX9-NEXT: s_bfe_u32 s2, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s2, s2, 0x40004
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s12, s2		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s11, s2
; GFX9-NEXT: v_pk_lshlrev_b16 v0, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v0, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s10, s11		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s9, s10
; GFX9-NEXT: v_pk_lshlrev_b16 v1, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v1, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s8, s9		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s5, s8
; GFX9-NEXT: v_pk_lshlrev_b16 v2, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v2, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s4, s5		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s3, s4
; GFX9-NEXT: s_bfe_u32 s7, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s7, s6, 0x40018
; GFX9-NEXT: s_lshr_b32 s13, s6, 28		; GFX9-NEXT: s_lshr_b32 s12, s6, 28
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40008
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c
; GFX9-NEXT: s_and_b32 s18, s6, 15		; GFX9-NEXT: s_and_b32 s17, s6, 15
; GFX9-NEXT: s_bfe_u32 s6, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s6, s6, 0x40004
; GFX9-NEXT: v_pk_lshlrev_b16 v3, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v3, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s18, s6		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s17, s6
; GFX9-NEXT: v_pk_lshlrev_b16 v4, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v4, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s16, s17		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s15, s16
; GFX9-NEXT: v_pk_lshlrev_b16 v5, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v5, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s14, s15		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s13, s14
; GFX9-NEXT: v_pk_ashrrev_i16 v0, 12, v0 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v0, 12, v0 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v1, 12, v1 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v1, 12, v1 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v5, 12, v5 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v5, 12, v5 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_lshlrev_b16 v6, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v6, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_mul_lo_u16 v5, v1, v5		; GFX9-NEXT: v_pk_mul_lo_u16 v5, v1, v5
; GFX9-NEXT: v_pk_mul_lo_u16 v4, v0, v4		; GFX9-NEXT: v_pk_mul_lo_u16 v4, v0, v4
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_pk_ashrrev_i16 v2, 12, v2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v2, 12, v2 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v6, 12, v6 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v6, 12, v6 op_sel_hi:[0,1]
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: v_pk_mul_lo_u16 v2, v2, v6		; GFX9-NEXT: v_pk_mul_lo_u16 v2, v2, v6
; GFX9-NEXT: global_load_ushort v6, v[0:1], off		; GFX9-NEXT: global_load_ushort v6, v[0:1], off
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s7, s13		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s7, s12
; GFX9-NEXT: v_pk_lshlrev_b16 v7, 12, s2 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_lshlrev_b16 v7, 12, s2 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_ashrrev_i16 v7, 12, v7 op_sel_hi:[0,1]		; GFX9-NEXT: v_pk_ashrrev_i16 v7, 12, v7 op_sel_hi:[0,1]
; GFX9-NEXT: v_pk_mul_lo_u16 v3, v3, v7		; GFX9-NEXT: v_pk_mul_lo_u16 v3, v3, v7
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_add_u32_e32 v6, v4, v6		; GFX9-NEXT: v_add_u32_e32 v6, v4, v6
; GFX9-NEXT: v_add_u32_sdwa v4, v6, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v4, v6, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX9-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX9-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: v_add_u32_e32 v4, v4, v2		; GFX9-NEXT: v_add_u32_e32 v4, v4, v2
; GFX9-NEXT: v_add_u32_sdwa v2, v4, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v2, v4, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: global_store_short v[0:1], v2, off		; GFX9-NEXT: global_store_short v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: idot8_acc16_vecMul:		; GFX9-DL-LABEL: idot8_acc16_vecMul:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_u32 s4, s2, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s2, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s5, s2, 28		; GFX9-DL-NEXT: s_lshr_b32 s4, s2, 28
; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x4000c
; GFX9-DL-NEXT: s_and_b32 s12, s2, 15		; GFX9-DL-NEXT: s_and_b32 s11, s2, 15
; GFX9-DL-NEXT: s_bfe_u32 s2, s2, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s2, s2, 0x40004
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s12, s2		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s11, s2
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v0, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v0, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s10, s11		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s9, s10
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v1, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v1, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s8, s9		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s5, s8
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v2, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v2, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s4, s5		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s3, s4
; GFX9-DL-NEXT: s_bfe_u32 s7, s6, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s7, s6, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s13, s6, 28		; GFX9-DL-NEXT: s_lshr_b32 s12, s6, 28
; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x4000c
; GFX9-DL-NEXT: s_and_b32 s18, s6, 15		; GFX9-DL-NEXT: s_and_b32 s17, s6, 15
; GFX9-DL-NEXT: s_bfe_u32 s6, s6, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s6, s6, 0x40004
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v3, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v3, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s18, s6		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s17, s6
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v4, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v4, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s16, s17		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s15, s16
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v5, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v5, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s14, s15		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s13, s14
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v0, 12, v0 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v0, 12, v0 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v1, 12, v1 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v1, 12, v1 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v5 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v5 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v5, v1, v5		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v5, v1, v5
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v4, v0, v4		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v4, v0, v4
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v2, 12, v2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v2, 12, v2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v6, 12, v6 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v6, 12, v6 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v2, v2, v6		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v2, v2, v6
; GFX9-DL-NEXT: global_load_ushort v6, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v6, v[0:1], off
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s7, s13		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s7, s12
; GFX9-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s2 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s2 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_ashrrev_i16 v7, 12, v7 op_sel_hi:[0,1]		; GFX9-DL-NEXT: v_pk_ashrrev_i16 v7, 12, v7 op_sel_hi:[0,1]
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v3, v3, v7		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v3, v3, v7
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_add_u32_e32 v6, v4, v6		; GFX9-DL-NEXT: v_add_u32_e32 v6, v4, v6
; GFX9-DL-NEXT: v_add_u32_sdwa v4, v6, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v4, v6, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX9-DL-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
; GFX9-DL-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v4, v4, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_e32 v4, v4, v2		; GFX9-DL-NEXT: v_add_u32_e32 v4, v4, v2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v4, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v4, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: global_store_short v[0:1], v2, off		; GFX9-DL-NEXT: global_store_short v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc16_vecMul:		; GFX10-DL-LABEL: idot8_acc16_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s5, s0, 15		; GFX10-DL-NEXT: s_and_b32 s4, s0, 15
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
; GFX10-DL-NEXT: s_and_b32 s7, s1, 15		; GFX10-DL-NEXT: s_and_b32 s6, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s3, s0, 28
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s5, s6		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s4, s5
; GFX10-DL-NEXT: s_bfe_u32 s9, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s8, s0, 0x40010
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s7, s7, s8		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s6, s6, s7
; GFX10-DL-NEXT: s_bfe_u32 s10, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s9, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40008
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v3, 12, s5 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v3, 12, s4 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v4, 12, s7 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v4, 12, s6 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x4000c
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v3 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s6, s0		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s5, s0
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v4 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40010
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s8, s5		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s7, s4
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40018
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v5, 12, s0 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v5, 12, s0 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_bfe_u32 s0, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s0, s1, 0x40014
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, v3, v4		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, v3, v4
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s5 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s4 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s9, s10		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s8, s9
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v5 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v4, 12, v5 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s6, s0		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s5, s0
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v6 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v6 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s5 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s4 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s0 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s0 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s2, s4		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s2, s3
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, v4, v5		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, v4, v5
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s1, s7, s1		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s1, s6, s1
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v7 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v5, 12, v7 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s1 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v7, 12, s1 op_sel_hi:[0,1]
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v6 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_ashrrev_i16 v3, 12, v6 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s0 op_sel_hi:[0,1]		; GFX10-DL-NEXT: v_pk_lshlrev_b16 v6, 12, s0 op_sel_hi:[0,1]
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:WORD_0
Show All 40 Lines	entry:
store i16 %add8, i16 addrspace(1)* %dst, align 4		store i16 %add8, i16 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Support this pattern.		; TODO: Support this pattern.
define amdgpu_kernel void @idot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @idot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: idot8_acc8_vecMul:		; GFX7-LABEL: idot8_acc8_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_movk_i32 s0, 0xff		; GFX7-NEXT: s_movk_i32 s8, 0xff
; GFX7-NEXT: s_mov_b32 s1, 0xffff		; GFX7-NEXT: s_mov_b32 s9, 0xffff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s2, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s8, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000		; GFX7-NEXT: s_bfe_i32 s6, s4, 0x40000
; GFX7-NEXT: s_bfe_i32 s16, s8, 0x40000		; GFX7-NEXT: s_bfe_i32 s15, s5, 0x40000
; GFX7-NEXT: s_bfe_i32 s17, s8, 0x40004		; GFX7-NEXT: s_bfe_i32 s16, s5, 0x40004
; GFX7-NEXT: s_bfe_i32 s18, s8, 0x40008		; GFX7-NEXT: s_bfe_i32 s17, s5, 0x40008
; GFX7-NEXT: s_bfe_i32 s19, s8, 0x4000c		; GFX7-NEXT: s_bfe_i32 s18, s5, 0x4000c
; GFX7-NEXT: s_bfe_i32 s20, s8, 0x40010		; GFX7-NEXT: s_bfe_i32 s19, s5, 0x40010
; GFX7-NEXT: s_bfe_i32 s21, s8, 0x40014		; GFX7-NEXT: s_bfe_i32 s20, s5, 0x40014
; GFX7-NEXT: s_bfe_i32 s22, s8, 0x40018		; GFX7-NEXT: s_bfe_i32 s21, s5, 0x40018
; GFX7-NEXT: s_ashr_i32 s8, s8, 28		; GFX7-NEXT: s_ashr_i32 s5, s5, 28
; GFX7-NEXT: v_mov_b32_e32 v8, s16		; GFX7-NEXT: v_mov_b32_e32 v8, s15
; GFX7-NEXT: s_bfe_i32 s10, s2, 0x40004		; GFX7-NEXT: s_bfe_i32 s7, s4, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v7, s17		; GFX7-NEXT: v_mov_b32_e32 v7, s16
; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40008		; GFX7-NEXT: s_bfe_i32 s10, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v6, s18		; GFX7-NEXT: v_mov_b32_e32 v6, s17
; GFX7-NEXT: s_bfe_i32 s12, s2, 0x4000c		; GFX7-NEXT: s_bfe_i32 s11, s4, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v5, s19		; GFX7-NEXT: v_mov_b32_e32 v5, s18
; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40010		; GFX7-NEXT: s_bfe_i32 s12, s4, 0x40010
; GFX7-NEXT: v_mov_b32_e32 v4, s20		; GFX7-NEXT: v_mov_b32_e32 v4, s19
; GFX7-NEXT: s_bfe_i32 s14, s2, 0x40014		; GFX7-NEXT: s_bfe_i32 s13, s4, 0x40014
; GFX7-NEXT: v_mov_b32_e32 v3, s21		; GFX7-NEXT: v_mov_b32_e32 v3, s20
; GFX7-NEXT: s_bfe_i32 s15, s2, 0x40018		; GFX7-NEXT: s_bfe_i32 s14, s4, 0x40018
; GFX7-NEXT: v_mov_b32_e32 v2, s22		; GFX7-NEXT: v_mov_b32_e32 v2, s21
; GFX7-NEXT: s_ashr_i32 s2, s2, 28		; GFX7-NEXT: s_ashr_i32 s4, s4, 28
; GFX7-NEXT: v_mov_b32_e32 v1, s8		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mul_i32_i24_e32 v1, s2, v1		; GFX7-NEXT: v_mul_i32_i24_e32 v1, s4, v1
; GFX7-NEXT: v_mul_i32_i24_e32 v2, s15, v2		; GFX7-NEXT: v_mul_i32_i24_e32 v2, s14, v2
; GFX7-NEXT: v_mul_i32_i24_e32 v3, s14, v3		; GFX7-NEXT: v_mul_i32_i24_e32 v3, s13, v3
; GFX7-NEXT: v_mul_i32_i24_e32 v9, s13, v4		; GFX7-NEXT: v_mul_i32_i24_e32 v9, s12, v4
; GFX7-NEXT: v_mul_i32_i24_e32 v5, s12, v5		; GFX7-NEXT: v_mul_i32_i24_e32 v5, s11, v5
; GFX7-NEXT: v_mul_i32_i24_e32 v6, s11, v6		; GFX7-NEXT: v_mul_i32_i24_e32 v6, s10, v6
; GFX7-NEXT: v_mul_i32_i24_e32 v7, s10, v7		; GFX7-NEXT: v_mul_i32_i24_e32 v7, s7, v7
; GFX7-NEXT: v_mul_i32_i24_e32 v8, s9, v8		; GFX7-NEXT: v_mul_i32_i24_e32 v8, s6, v8
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 8, v1		; GFX7-NEXT: v_lshlrev_b32_e32 v1, 8, v1
; GFX7-NEXT: v_and_b32_e32 v2, s0, v2		; GFX7-NEXT: v_and_b32_e32 v2, s8, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v3, 8, v3		; GFX7-NEXT: v_lshlrev_b32_e32 v3, 8, v3
; GFX7-NEXT: v_and_b32_e32 v9, s0, v9		; GFX7-NEXT: v_and_b32_e32 v9, s8, v9
; GFX7-NEXT: v_lshlrev_b32_e32 v5, 8, v5		; GFX7-NEXT: v_lshlrev_b32_e32 v5, 8, v5
; GFX7-NEXT: v_and_b32_e32 v6, s0, v6		; GFX7-NEXT: v_and_b32_e32 v6, s8, v6
; GFX7-NEXT: v_lshlrev_b32_e32 v7, 8, v7		; GFX7-NEXT: v_lshlrev_b32_e32 v7, 8, v7
; GFX7-NEXT: v_and_b32_e32 v8, s0, v8		; GFX7-NEXT: v_and_b32_e32 v8, s8, v8
; GFX7-NEXT: v_or_b32_e32 v1, v2, v1		; GFX7-NEXT: v_or_b32_e32 v1, v2, v1
; GFX7-NEXT: v_or_b32_e32 v2, v9, v3		; GFX7-NEXT: v_or_b32_e32 v2, v9, v3
; GFX7-NEXT: v_or_b32_e32 v3, v6, v5		; GFX7-NEXT: v_or_b32_e32 v3, v6, v5
; GFX7-NEXT: v_or_b32_e32 v5, v8, v7		; GFX7-NEXT: v_or_b32_e32 v5, v8, v7
; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1		; GFX7-NEXT: v_lshlrev_b32_e32 v1, 16, v1
; GFX7-NEXT: v_and_b32_e32 v2, s1, v2		; GFX7-NEXT: v_and_b32_e32 v2, s9, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3		; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX7-NEXT: v_and_b32_e32 v5, s1, v5		; GFX7-NEXT: v_and_b32_e32 v5, s9, v5
; GFX7-NEXT: v_or_b32_e32 v1, v2, v1		; GFX7-NEXT: v_or_b32_e32 v1, v2, v1
; GFX7-NEXT: v_or_b32_e32 v2, v5, v3		; GFX7-NEXT: v_or_b32_e32 v2, v5, v3
; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 8		; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 8
; GFX7-NEXT: v_alignbit_b32 v5, v1, v2, 16		; GFX7-NEXT: v_alignbit_b32 v5, v1, v2, 16
; GFX7-NEXT: v_lshrrev_b32_e32 v6, 24, v2		; GFX7-NEXT: v_lshrrev_b32_e32 v6, 24, v2
; GFX7-NEXT: v_lshrrev_b32_e32 v7, 8, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v7, 8, v1
; GFX7-NEXT: v_lshrrev_b32_e32 v8, 16, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v8, 16, v1
; GFX7-NEXT: v_lshrrev_b32_e32 v1, 24, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v1, 24, v1
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v5, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v5, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v6, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v6, v0
; GFX7-NEXT: v_mad_i32_i24 v0, s13, v4, v0		; GFX7-NEXT: v_mad_i32_i24 v0, s12, v4, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v7		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v7
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v8		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v8
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: idot8_acc8_vecMul:		; GFX8-LABEL: idot8_acc8_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_mov_b32 s2, 0xffff		; GFX8-NEXT: s_mov_b32 s33, 0xffff
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s5, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshl_b32 s13, s1, 24		; GFX8-NEXT: s_lshl_b32 s11, s1, 24
; GFX8-NEXT: s_lshl_b32 s17, s1, 16		; GFX8-NEXT: s_lshl_b32 s15, s1, 16
; GFX8-NEXT: s_ashr_i64 s[22:23], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[20:21], s[2:3], 60
; GFX8-NEXT: s_lshl_b32 s25, s5, 24		; GFX8-NEXT: s_lshl_b32 s23, s3, 24
; GFX8-NEXT: s_lshl_b32 s27, s5, 28		; GFX8-NEXT: s_lshl_b32 s25, s3, 28
; GFX8-NEXT: s_lshl_b32 s29, s5, 16		; GFX8-NEXT: s_lshl_b32 s27, s3, 16
; GFX8-NEXT: s_ashr_i64 s[10:11], s[0:1], 60		; GFX8-NEXT: s_ashr_i64 s[8:9], s[0:1], 60
; GFX8-NEXT: s_lshl_b32 s15, s1, 28		; GFX8-NEXT: s_lshl_b32 s13, s1, 28
; GFX8-NEXT: s_lshl_b32 s19, s5, 8		; GFX8-NEXT: s_lshl_b32 s17, s3, 8
; GFX8-NEXT: s_lshl_b32 s21, s5, 12		; GFX8-NEXT: s_lshl_b32 s19, s3, 12
; GFX8-NEXT: s_lshl_b32 s23, s5, 4		; GFX8-NEXT: s_lshl_b32 s21, s3, 4
; GFX8-NEXT: s_lshl_b32 s5, s5, 20		; GFX8-NEXT: s_lshl_b32 s3, s3, 20
; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60		; GFX8-NEXT: s_ashr_i64 s[10:11], s[10:11], 60
; GFX8-NEXT: s_ashr_i64 s[16:17], s[16:17], 60		; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60
		; GFX8-NEXT: s_ashr_i64 s[22:23], s[22:23], 60
; GFX8-NEXT: s_ashr_i64 s[24:25], s[24:25], 60		; GFX8-NEXT: s_ashr_i64 s[24:25], s[24:25], 60
; GFX8-NEXT: s_ashr_i64 s[26:27], s[26:27], 60		; GFX8-NEXT: s_ashr_i64 s[26:27], s[26:27], 60
; GFX8-NEXT: s_ashr_i64 s[28:29], s[28:29], 60		; GFX8-NEXT: s_lshl_b32 s5, s1, 8
; GFX8-NEXT: s_lshl_b32 s7, s1, 8		; GFX8-NEXT: s_lshl_b32 s7, s1, 12
; GFX8-NEXT: s_lshl_b32 s9, s1, 12		; GFX8-NEXT: s_lshl_b32 s9, s1, 4
; GFX8-NEXT: s_lshl_b32 s11, s1, 4
; GFX8-NEXT: s_lshl_b32 s1, s1, 20		; GFX8-NEXT: s_lshl_b32 s1, s1, 20
; GFX8-NEXT: s_ashr_i64 s[4:5], s[4:5], 60		; GFX8-NEXT: s_ashr_i64 s[2:3], s[2:3], 60
; GFX8-NEXT: s_ashr_i64 s[14:15], s[14:15], 60		; GFX8-NEXT: s_ashr_i64 s[12:13], s[12:13], 60
; GFX8-NEXT: v_mov_b32_e32 v6, s28		; GFX8-NEXT: v_mov_b32_e32 v6, s26
; GFX8-NEXT: v_mov_b32_e32 v7, s16		; GFX8-NEXT: v_mov_b32_e32 v7, s14
; GFX8-NEXT: v_mov_b32_e32 v8, s26		; GFX8-NEXT: v_mov_b32_e32 v8, s24
; GFX8-NEXT: v_mov_b32_e32 v9, s24		; GFX8-NEXT: v_mov_b32_e32 v9, s22
; GFX8-NEXT: v_mov_b32_e32 v10, s12		; GFX8-NEXT: v_mov_b32_e32 v10, s10
; GFX8-NEXT: v_mul_i32_i24_sdwa v6, v7, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_i32_i24_sdwa v6, v7, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_mul_i32_i24_e32 v7, s14, v8		; GFX8-NEXT: v_mul_i32_i24_e32 v7, s12, v8
; GFX8-NEXT: v_mul_i32_i24_sdwa v8, v10, v9 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_i32_i24_sdwa v8, v10, v9 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: s_ashr_i64 s[0:1], s[0:1], 60		; GFX8-NEXT: s_ashr_i64 s[0:1], s[0:1], 60
; GFX8-NEXT: v_mov_b32_e32 v5, s4		; GFX8-NEXT: v_mov_b32_e32 v5, s2
; GFX8-NEXT: v_mul_i32_i24_e32 v5, s0, v5		; GFX8-NEXT: v_mul_i32_i24_e32 v5, s0, v5
; GFX8-NEXT: v_or_b32_sdwa v7, v7, v8 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v7, v7, v8 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX8-NEXT: s_ashr_i64 s[6:7], s[6:7], 60		; GFX8-NEXT: s_ashr_i64 s[4:5], s[4:5], 60
; GFX8-NEXT: s_ashr_i64 s[18:19], s[18:19], 60		; GFX8-NEXT: s_ashr_i64 s[16:17], s[16:17], 60
; GFX8-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX8-NEXT: v_and_b32_e32 v6, s2, v7		; GFX8-NEXT: v_and_b32_e32 v6, s33, v7
; GFX8-NEXT: s_ashr_i64 s[20:21], s[20:21], 60		; GFX8-NEXT: s_ashr_i64 s[18:19], s[18:19], 60
; GFX8-NEXT: v_mov_b32_e32 v3, s22		; GFX8-NEXT: v_mov_b32_e32 v3, s20
; GFX8-NEXT: v_mov_b32_e32 v4, s10		; GFX8-NEXT: v_mov_b32_e32 v4, s8
; GFX8-NEXT: s_ashr_i64 s[32:33], s[22:23], 60		; GFX8-NEXT: s_ashr_i64 s[30:31], s[20:21], 60
; GFX8-NEXT: v_mul_i32_i24_sdwa v3, v4, v3 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_i32_i24_sdwa v3, v4, v3 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_or_b32_e32 v5, v6, v5		; GFX8-NEXT: v_or_b32_e32 v5, v6, v5
; GFX8-NEXT: s_ashr_i64 s[8:9], s[8:9], 60		; GFX8-NEXT: s_ashr_i64 s[6:7], s[6:7], 60
; GFX8-NEXT: v_mov_b32_e32 v4, s20		; GFX8-NEXT: v_mov_b32_e32 v4, s18
; GFX8-NEXT: v_mov_b32_e32 v12, s18		; GFX8-NEXT: v_mov_b32_e32 v12, s16
; GFX8-NEXT: v_mov_b32_e32 v13, s6		; GFX8-NEXT: v_mov_b32_e32 v13, s4
; GFX8-NEXT: s_ashr_i64 s[30:31], s[10:11], 60		; GFX8-NEXT: s_ashr_i64 s[28:29], s[8:9], 60
; GFX8-NEXT: v_mov_b32_e32 v11, s32		; GFX8-NEXT: v_mov_b32_e32 v11, s30
; GFX8-NEXT: v_mul_i32_i24_e32 v4, s8, v4		; GFX8-NEXT: v_mul_i32_i24_e32 v4, s6, v4
; GFX8-NEXT: v_mul_i32_i24_sdwa v10, v13, v12 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_i32_i24_sdwa v10, v13, v12 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_lshrrev_b32_e32 v7, 8, v5		; GFX8-NEXT: v_lshrrev_b32_e32 v7, 8, v5
; GFX8-NEXT: v_or_b32_sdwa v4, v4, v10 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v4, v4, v10 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX8-NEXT: v_mul_i32_i24_e32 v9, s30, v11		; GFX8-NEXT: v_mul_i32_i24_e32 v9, s28, v11
; GFX8-NEXT: v_or_b32_sdwa v3, v9, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v3, v9, v3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX8-NEXT: v_and_b32_e32 v4, s2, v4		; GFX8-NEXT: v_and_b32_e32 v4, s33, v4
; GFX8-NEXT: v_or_b32_e32 v3, v4, v3		; GFX8-NEXT: v_or_b32_e32 v3, v4, v3
; GFX8-NEXT: v_lshrrev_b32_e32 v8, 8, v3		; GFX8-NEXT: v_lshrrev_b32_e32 v8, 8, v3
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v6		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v6
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v7, v2		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v7, v2
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v5, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_2 src1_sel:BYTE_0		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v5, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_2 src1_sel:BYTE_0
; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v5, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_3 src1_sel:DWORD		; GFX8-NEXT: v_add_u32_sdwa v2, vcc, v5, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_3 src1_sel:DWORD
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v4		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v4
Show All 10 Lines
; GFX9-NEXT: s_mov_b32 s2, 0xffff		; GFX9-NEXT: s_mov_b32 s2, 0xffff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s8, s0, 4		; GFX9-NEXT: s_lshr_b32 s7, s0, 4
; GFX9-NEXT: s_lshr_b32 s15, s1, 4		; GFX9-NEXT: s_lshr_b32 s14, s1, 4
; GFX9-NEXT: v_lshlrev_b16_e64 v3, 12, s0		; GFX9-NEXT: v_lshlrev_b16_e64 v3, 12, s0
; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX9-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX9-NEXT: v_lshlrev_b16_e64 v7, 12, s8		; GFX9-NEXT: v_lshlrev_b16_e64 v7, 12, s7
; GFX9-NEXT: v_lshlrev_b16_e64 v14, 12, s15		; GFX9-NEXT: v_lshlrev_b16_e64 v14, 12, s14
; GFX9-NEXT: s_lshr_b32 s9, s0, 12		; GFX9-NEXT: s_lshr_b32 s8, s0, 12
; GFX9-NEXT: s_lshr_b32 s10, s0, 8		; GFX9-NEXT: s_lshr_b32 s9, s0, 8
; GFX9-NEXT: s_lshr_b32 s16, s1, 12		; GFX9-NEXT: s_lshr_b32 s15, s1, 12
; GFX9-NEXT: s_lshr_b32 s17, s1, 8		; GFX9-NEXT: s_lshr_b32 s16, s1, 8
; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s10		; GFX9-NEXT: v_lshlrev_b16_e64 v5, 12, s9
; GFX9-NEXT: v_lshlrev_b16_e64 v6, 12, s9		; GFX9-NEXT: v_lshlrev_b16_e64 v6, 12, s8
; GFX9-NEXT: v_lshlrev_b16_e64 v12, 12, s17		; GFX9-NEXT: v_lshlrev_b16_e64 v12, 12, s16
; GFX9-NEXT: v_lshlrev_b16_e64 v13, 12, s16		; GFX9-NEXT: v_lshlrev_b16_e64 v13, 12, s15
; GFX9-NEXT: v_ashrrev_i16_e32 v3, 12, v3		; GFX9-NEXT: v_ashrrev_i16_e32 v3, 12, v3
; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-NEXT: v_ashrrev_i16_e32 v7, 12, v7		; GFX9-NEXT: v_ashrrev_i16_e32 v7, 12, v7
; GFX9-NEXT: v_ashrrev_i16_e32 v14, 12, v14		; GFX9-NEXT: v_ashrrev_i16_e32 v14, 12, v14
; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-NEXT: v_ashrrev_i16_e32 v12, 12, v12		; GFX9-NEXT: v_ashrrev_i16_e32 v12, 12, v12
; GFX9-NEXT: v_ashrrev_i16_e32 v6, 12, v6		; GFX9-NEXT: v_ashrrev_i16_e32 v6, 12, v6
; GFX9-NEXT: v_ashrrev_i16_e32 v13, 12, v13		; GFX9-NEXT: v_ashrrev_i16_e32 v13, 12, v13
; GFX9-NEXT: v_mul_lo_u16_e32 v3, v3, v4		; GFX9-NEXT: v_mul_lo_u16_e32 v3, v3, v4
; GFX9-NEXT: v_mul_lo_u16_sdwa v7, v7, v14 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v7, v7, v14 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_or_b32_sdwa v3, v3, v7 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX9-NEXT: v_or_b32_sdwa v3, v3, v7 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX9-NEXT: s_lshr_b32 s4, s0, 20		; GFX9-NEXT: s_lshr_b32 s3, s0, 20
; GFX9-NEXT: s_lshr_b32 s5, s0, 16		; GFX9-NEXT: s_lshr_b32 s4, s0, 16
; GFX9-NEXT: s_lshr_b32 s11, s1, 20		; GFX9-NEXT: s_lshr_b32 s10, s1, 20
; GFX9-NEXT: s_lshr_b32 s12, s1, 16		; GFX9-NEXT: s_lshr_b32 s11, s1, 16
; GFX9-NEXT: v_mul_lo_u16_sdwa v6, v6, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v6, v6, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_mul_lo_u16_e32 v5, v5, v12		; GFX9-NEXT: v_mul_lo_u16_e32 v5, v5, v12
; GFX9-NEXT: v_lshlrev_b16_e64 v10, 12, s5		; GFX9-NEXT: v_lshlrev_b16_e64 v10, 12, s4
; GFX9-NEXT: v_lshlrev_b16_e64 v11, 12, s4		; GFX9-NEXT: v_lshlrev_b16_e64 v11, 12, s3
; GFX9-NEXT: v_lshlrev_b16_e64 v17, 12, s12		; GFX9-NEXT: v_lshlrev_b16_e64 v17, 12, s11
; GFX9-NEXT: v_lshlrev_b16_e64 v18, 12, s11		; GFX9-NEXT: v_lshlrev_b16_e64 v18, 12, s10
; GFX9-NEXT: s_lshr_b32 s6, s0, 28		; GFX9-NEXT: s_lshr_b32 s5, s0, 28
; GFX9-NEXT: s_lshr_b32 s7, s0, 24		; GFX9-NEXT: s_lshr_b32 s6, s0, 24
; GFX9-NEXT: s_lshr_b32 s13, s1, 28		; GFX9-NEXT: s_lshr_b32 s12, s1, 28
; GFX9-NEXT: s_lshr_b32 s14, s1, 24		; GFX9-NEXT: s_lshr_b32 s13, s1, 24
; GFX9-NEXT: v_and_b32_e32 v3, s2, v3		; GFX9-NEXT: v_and_b32_e32 v3, s2, v3
; GFX9-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX9-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX9-NEXT: v_lshlrev_b16_e64 v8, 12, s7		; GFX9-NEXT: v_lshlrev_b16_e64 v8, 12, s6
; GFX9-NEXT: v_lshlrev_b16_e64 v9, 12, s6		; GFX9-NEXT: v_lshlrev_b16_e64 v9, 12, s5
; GFX9-NEXT: v_lshlrev_b16_e64 v15, 12, s14		; GFX9-NEXT: v_lshlrev_b16_e64 v15, 12, s13
; GFX9-NEXT: v_lshlrev_b16_e64 v16, 12, s13		; GFX9-NEXT: v_lshlrev_b16_e64 v16, 12, s12
; GFX9-NEXT: v_or_b32_e32 v5, v3, v5		; GFX9-NEXT: v_or_b32_e32 v5, v3, v5
; GFX9-NEXT: v_ashrrev_i16_e32 v10, 12, v10		; GFX9-NEXT: v_ashrrev_i16_e32 v10, 12, v10
; GFX9-NEXT: v_ashrrev_i16_e32 v17, 12, v17		; GFX9-NEXT: v_ashrrev_i16_e32 v17, 12, v17
; GFX9-NEXT: v_ashrrev_i16_e32 v11, 12, v11		; GFX9-NEXT: v_ashrrev_i16_e32 v11, 12, v11
; GFX9-NEXT: v_ashrrev_i16_e32 v18, 12, v18		; GFX9-NEXT: v_ashrrev_i16_e32 v18, 12, v18
; GFX9-NEXT: v_ashrrev_i16_e32 v8, 12, v8		; GFX9-NEXT: v_ashrrev_i16_e32 v8, 12, v8
; GFX9-NEXT: v_ashrrev_i16_e32 v15, 12, v15		; GFX9-NEXT: v_ashrrev_i16_e32 v15, 12, v15
; GFX9-NEXT: v_ashrrev_i16_e32 v9, 12, v9		; GFX9-NEXT: v_ashrrev_i16_e32 v9, 12, v9
Show All 27 Lines
; GFX9-DL-NEXT: s_mov_b32 s2, 0xffff		; GFX9-DL-NEXT: s_mov_b32 s2, 0xffff
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_lshr_b32 s8, s0, 4		; GFX9-DL-NEXT: s_lshr_b32 s7, s0, 4
; GFX9-DL-NEXT: s_lshr_b32 s15, s1, 4		; GFX9-DL-NEXT: s_lshr_b32 s14, s1, 4
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s8		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s7
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v14, 12, s15		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v14, 12, s14
; GFX9-DL-NEXT: s_lshr_b32 s9, s0, 12		; GFX9-DL-NEXT: s_lshr_b32 s8, s0, 12
; GFX9-DL-NEXT: s_lshr_b32 s10, s0, 8		; GFX9-DL-NEXT: s_lshr_b32 s9, s0, 8
; GFX9-DL-NEXT: s_lshr_b32 s16, s1, 12		; GFX9-DL-NEXT: s_lshr_b32 s15, s1, 12
; GFX9-DL-NEXT: s_lshr_b32 s17, s1, 8		; GFX9-DL-NEXT: s_lshr_b32 s16, s1, 8
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s10		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s9
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v6, 12, s9		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v6, 12, s8
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s17		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s16
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s16		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s15
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v3, 12, v3		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v3, 12, v3
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v4, 12, v4
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v7, 12, v7		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v7, 12, v7
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v14, 12, v14		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v14, 12, v14
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v5, 12, v5
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v12, 12, v12		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v12, 12, v12
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v6, 12, v6		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v6, 12, v6
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v13, 12, v13		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v13, 12, v13
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v3, v3, v4		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v3, v3, v4
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v7, v7, v14 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v7, v7, v14 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_or_b32_sdwa v3, v3, v7 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX9-DL-NEXT: v_or_b32_sdwa v3, v3, v7 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX9-DL-NEXT: s_lshr_b32 s4, s0, 20		; GFX9-DL-NEXT: s_lshr_b32 s3, s0, 20
; GFX9-DL-NEXT: s_lshr_b32 s5, s0, 16		; GFX9-DL-NEXT: s_lshr_b32 s4, s0, 16
; GFX9-DL-NEXT: s_lshr_b32 s11, s1, 20		; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 20
; GFX9-DL-NEXT: s_lshr_b32 s12, s1, 16		; GFX9-DL-NEXT: s_lshr_b32 s11, s1, 16
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v6, v6, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v6, v6, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v5, v5, v12		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v5, v5, v12
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v10, 12, s5		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v10, 12, s4
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v11, 12, s4		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v11, 12, s3
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v17, 12, s12		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v17, 12, s11
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v18, 12, s11		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v18, 12, s10
; GFX9-DL-NEXT: s_lshr_b32 s6, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s5, s0, 28
; GFX9-DL-NEXT: s_lshr_b32 s7, s0, 24		; GFX9-DL-NEXT: s_lshr_b32 s6, s0, 24
; GFX9-DL-NEXT: s_lshr_b32 s13, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s12, s1, 28
; GFX9-DL-NEXT: s_lshr_b32 s14, s1, 24		; GFX9-DL-NEXT: s_lshr_b32 s13, s1, 24
; GFX9-DL-NEXT: v_and_b32_e32 v3, s2, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, s2, v3
; GFX9-DL-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX9-DL-NEXT: v_or_b32_sdwa v5, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v8, 12, s7		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v8, 12, s6
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v9, 12, s6		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v9, 12, s5
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v15, 12, s14		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v15, 12, s13
; GFX9-DL-NEXT: v_lshlrev_b16_e64 v16, 12, s13		; GFX9-DL-NEXT: v_lshlrev_b16_e64 v16, 12, s12
; GFX9-DL-NEXT: v_or_b32_e32 v5, v3, v5		; GFX9-DL-NEXT: v_or_b32_e32 v5, v3, v5
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v10, 12, v10		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v10, 12, v10
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v17, 12, v17		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v17, 12, v17
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v11, 12, v11		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v11, 12, v11
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v18, 12, v18		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v18, 12, v18
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v8, 12, v8		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v8, 12, v8
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v15, 12, v15		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v15, 12, v15
; GFX9-DL-NEXT: v_ashrrev_i16_e32 v9, 12, v9		; GFX9-DL-NEXT: v_ashrrev_i16_e32 v9, 12, v9
Show All 17 Lines
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: idot8_acc8_vecMul:		; GFX10-DL-LABEL: idot8_acc8_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
		; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_lshr_b32 s8, s0, 4		; GFX10-DL-NEXT: s_lshr_b32 s7, s0, 4
; GFX10-DL-NEXT: s_lshr_b32 s15, s1, 4		; GFX10-DL-NEXT: s_lshr_b32 s14, s1, 4
; GFX10-DL-NEXT: s_lshr_b32 s9, s0, 12		; GFX10-DL-NEXT: s_lshr_b32 s8, s0, 12
; GFX10-DL-NEXT: s_lshr_b32 s16, s1, 12		; GFX10-DL-NEXT: s_lshr_b32 s15, s1, 12
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s8		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s7
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s15		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s14
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s1
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v14, 12, s16		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v14, 12, s15
; GFX10-DL-NEXT: s_lshr_b32 s10, s0, 8		; GFX10-DL-NEXT: s_lshr_b32 s9, s0, 8
; GFX10-DL-NEXT: s_lshr_b32 s17, s1, 8		; GFX10-DL-NEXT: s_lshr_b32 s16, s1, 8
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v6, 12, s9		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v6, 12, s8
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v7, 12, v7		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v7, 12, v7
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v12		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v12
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s10		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v5, 12, s9
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v3, 12, v3
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 12, v4
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s17		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s16
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v7, v7, v12		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v7, v7, v12
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v19, 12, v6		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v19, 12, v6
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v14, 12, v14		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v14, 12, v14
; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 20		; GFX10-DL-NEXT: s_lshr_b32 s3, s0, 20
; GFX10-DL-NEXT: s_lshr_b32 s5, s0, 16		; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 16
; GFX10-DL-NEXT: s_lshr_b32 s6, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s5, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s7, s0, 24		; GFX10-DL-NEXT: s_lshr_b32 s6, s0, 24
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, v3, v4		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, v3, v4
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, v19, v14		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, v19, v14
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v6, 8, v7		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v6, 8, v7
; GFX10-DL-NEXT: s_lshr_b32 s11, s1, 20		; GFX10-DL-NEXT: s_lshr_b32 s10, s1, 20
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v13		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v13
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 12, v5		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 12, v5
; GFX10-DL-NEXT: s_lshr_b32 s12, s1, 16		; GFX10-DL-NEXT: s_lshr_b32 s11, s1, 16
; GFX10-DL-NEXT: v_or_b32_sdwa v3, v3, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX10-DL-NEXT: v_or_b32_sdwa v3, v3, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX10-DL-NEXT: s_lshr_b32 s13, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s12, s1, 28
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v8, 12, s7		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v8, 12, s6
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v9, 12, s6		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v9, 12, s5
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v10, 12, s5		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v10, 12, s4
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v11, 12, s4		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v11, 12, s3
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s11		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v13, 12, s10
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, v5, v12		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, v5, v12
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 8, v4		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 8, v4
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s12		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s11
; GFX10-DL-NEXT: s_lshr_b32 s14, s1, 24		; GFX10-DL-NEXT: s_lshr_b32 s13, s1, 24
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v6, 12, v8		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v6, 12, v8
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v8, 12, v9		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v8, 12, v9
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v9, 12, v10		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v9, 12, v10
; GFX10-DL-NEXT: v_or_b32_sdwa v4, v5, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD		; GFX10-DL-NEXT: v_or_b32_sdwa v4, v5, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
; GFX10-DL-NEXT: v_and_b32_e32 v3, s2, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s2, v3
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v16, 12, s13		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v16, 12, s12
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 12, v11		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 12, v11
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v10, 12, v13		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v10, 12, v13
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v15, 12, s14		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v15, 12, s13
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v7, 12, v7		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v7, 12, v7
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v11, 12, v16		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v11, 12, v16
; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v4		; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v4
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, v5, v10		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, v5, v10
; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v15		; GFX10-DL-NEXT: v_ashrrev_i16_e64 v12, 12, v15
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v10, v9, v7		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v10, v9, v7
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v8, v8, v11		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v8, v8, v11
; GFX10-DL-NEXT: v_lshrrev_b32_e32 v9, 8, v4		; GFX10-DL-NEXT: v_lshrrev_b32_e32 v9, 8, v4
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot8u.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s
; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s		; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s

define amdgpu_kernel void @udot8_acc32(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc32(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc32:		; GFX7-LABEL: udot8_acc32:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s11, s10, 28		; GFX7-NEXT: s_lshr_b32 s7, s6, 28
; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s6, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s6, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s6, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s6, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s6, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s6, 0x40004
; GFX7-NEXT: s_and_b32 s10, s10, 15		; GFX7-NEXT: s_and_b32 s6, s6, 15
; GFX7-NEXT: s_lshr_b32 s1, s0, 28		; GFX7-NEXT: s_lshr_b32 s5, s4, 28
; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s13, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v0, s10		; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s21
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s20		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: v_mad_u32_u24 v0, s14, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s19		; GFX7-NEXT: v_mov_b32_e32 v1, s19
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s13, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s18		; GFX7-NEXT: v_mov_b32_e32 v1, s18
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s17		; GFX7-NEXT: v_mov_b32_e32 v1, s17
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s16		; GFX7-NEXT: v_mov_b32_e32 v1, s16
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s15		; GFX7-NEXT: v_mov_b32_e32 v1, s15
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s11		; GFX7-NEXT: v_mov_b32_e32 v1, s14
; GFX7-NEXT: v_mad_u32_u24 v0, s1, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: v_mov_b32_e32 v1, s7
		; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc32:		; GFX8-LABEL: udot8_acc32:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s7, s6, 28		; GFX8-NEXT: s_lshr_b32 s7, s6, 28
; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX8-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX8-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX8-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX8-NEXT: s_and_b32 s6, s6, 15		; GFX8-NEXT: s_and_b32 s6, s6, 15
; GFX8-NEXT: s_lshr_b32 s4, s2, 28		; GFX8-NEXT: s_lshr_b32 s3, s2, 28
; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX8-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX8-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX8-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX8-NEXT: s_and_b32 s2, s2, 15		; GFX8-NEXT: s_and_b32 s2, s2, 15
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s6
; GFX8-NEXT: v_mov_b32_e32 v1, s19
; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s18		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s17		; GFX8-NEXT: v_mov_b32_e32 v1, s17
; GFX8-NEXT: v_mad_u32_u24 v0, s11, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s16		; GFX8-NEXT: v_mov_b32_e32 v1, s16
; GFX8-NEXT: v_mad_u32_u24 v0, s10, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s15		; GFX8-NEXT: v_mov_b32_e32 v1, s15
; GFX8-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s14		; GFX8-NEXT: v_mov_b32_e32 v1, s14
; GFX8-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s13		; GFX8-NEXT: v_mov_b32_e32 v1, s13
; GFX8-NEXT: v_mad_u32_u24 v0, s5, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX8-NEXT: v_mov_b32_e32 v1, s12
		; GFX8-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s7		; GFX8-NEXT: v_mov_b32_e32 v1, s7
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc32:		; GFX9-LABEL: udot8_acc32:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s7, s6, 28		; GFX9-NEXT: s_lshr_b32 s7, s6, 28
; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX9-NEXT: s_and_b32 s6, s6, 15		; GFX9-NEXT: s_and_b32 s6, s6, 15
; GFX9-NEXT: s_lshr_b32 s4, s2, 28		; GFX9-NEXT: s_lshr_b32 s3, s2, 28
; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX9-NEXT: s_and_b32 s2, s2, 15		; GFX9-NEXT: s_and_b32 s2, s2, 15
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s6
; GFX9-NEXT: v_mov_b32_e32 v1, s19
; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s18		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s17		; GFX9-NEXT: v_mov_b32_e32 v1, s17
; GFX9-NEXT: v_mad_u32_u24 v0, s11, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s16		; GFX9-NEXT: v_mov_b32_e32 v1, s16
; GFX9-NEXT: v_mad_u32_u24 v0, s10, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s15		; GFX9-NEXT: v_mov_b32_e32 v1, s15
; GFX9-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s14		; GFX9-NEXT: v_mov_b32_e32 v1, s14
; GFX9-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s13		; GFX9-NEXT: v_mov_b32_e32 v1, s13
; GFX9-NEXT: v_mad_u32_u24 v0, s5, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX9-NEXT: v_mov_b32_e32 v1, s12
		; GFX9-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s7		; GFX9-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc32:		; GFX9-DL-LABEL: udot8_acc32:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s6, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1		; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc32:		; GFX10-DL-LABEL: udot8_acc32:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX10-DL-NEXT: v_dot8_u32_u4 v2, s1, s2, v0		; GFX10-DL-NEXT: v_dot8_u32_u4 v2, s0, s1, v0
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s8		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s9		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Remove the unnecessary instruction(that is zero-extending the		; TODO: Remove the unnecessary instruction(that is zero-extending the
; 2nd MAD) to have the pattern-recognizer to kick in.		; 2nd MAD) to have the pattern-recognizer to kick in.
define amdgpu_kernel void @udot8_acc16(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc16(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc16:		; GFX7-LABEL: udot8_acc16:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc16:		; GFX8-LABEL: udot8_acc16:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ushort v2, v[0:1]		; GFX8-NEXT: flat_load_ushort v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX8-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_and_b32 s1, s1, 15		; GFX8-NEXT: s_and_b32 s1, s1, 15
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX8-NEXT: s_and_b32 s0, s0, 15		; GFX8-NEXT: s_and_b32 s0, s0, 15
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX8-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: flat_store_short v[0:1], v2		; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc16:		; GFX9-LABEL: udot8_acc16:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ushort v2, v[0:1], off		; GFX9-NEXT: global_load_ushort v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX9-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_and_b32 s1, s1, 15		; GFX9-NEXT: s_and_b32 s1, s1, 15
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-NEXT: s_and_b32 s0, s0, 15		; GFX9-NEXT: s_and_b32 s0, s0, 15
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: global_store_short v[0:1], v2, off		; GFX9-NEXT: global_store_short v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc16:		; GFX9-DL-LABEL: udot8_acc16:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_and_b32 s1, s1, 15		; GFX9-DL-NEXT: s_and_b32 s1, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-DL-NEXT: s_and_b32 s0, s0, 15		; GFX9-DL-NEXT: s_and_b32 s0, s0, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: global_store_short v[0:1], v2, off		; GFX9-DL-NEXT: global_store_short v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc16:		; GFX10-DL-LABEL: udot8_acc16:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40008
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c
; GFX10-DL-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: global_store_short v[0:1], v2, off		; GFX10-DL-NEXT: global_store_short v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i16 addrspace(1)* nocapture %dst) {		i16 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Remove the unnecessary instruction(that is zero-extending the		; TODO: Remove the unnecessary instruction(that is zero-extending the
; 2nd MAD) to have the pattern-recognizer to kick in.		; 2nd MAD) to have the pattern-recognizer to kick in.
define amdgpu_kernel void @udot8_acc8(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc8(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc8:		; GFX7-LABEL: udot8_acc8:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc8:		; GFX8-LABEL: udot8_acc8:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX8-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_and_b32 s1, s1, 15		; GFX8-NEXT: s_and_b32 s1, s1, 15
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX8-NEXT: s_and_b32 s0, s0, 15		; GFX8-NEXT: s_and_b32 s0, s0, 15
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_and_b32_e32 v2, 0xff, v2		; GFX8-NEXT: v_and_b32_e32 v2, 0xff, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc8:		; GFX9-LABEL: udot8_acc8:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX9-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_and_b32 s1, s1, 15		; GFX9-NEXT: s_and_b32 s1, s1, 15
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-NEXT: s_and_b32 s0, s0, 15		; GFX9-NEXT: s_and_b32 s0, s0, 15
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_and_b32_e32 v2, 0xff, v2		; GFX9-NEXT: v_and_b32_e32 v2, 0xff, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc8:		; GFX9-DL-LABEL: udot8_acc8:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_and_b32 s1, s1, 15		; GFX9-DL-NEXT: s_and_b32 s1, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-DL-NEXT: s_and_b32 s0, s0, 15		; GFX9-DL-NEXT: s_and_b32 s0, s0, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 0xff, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 0xff, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc8:		; GFX10-DL-LABEL: udot8_acc8:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40008
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c
; GFX10-DL-NEXT: v_and_b32_e32 v2, 0xff, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 0xff, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i8 addrspace(1)* nocapture %dst) {		i8 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Remove the two unnecessary instructions(and+add after 2nd MAD)		; TODO: Remove the two unnecessary instructions(and+add after 2nd MAD)
; to have the pattern-recognizer to kick in.		; to have the pattern-recognizer to kick in.
define amdgpu_kernel void @udot8_acc4(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc4(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc4:		; GFX7-LABEL: udot8_acc4:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_and_b32_e32 v0, 15, v0		; GFX7-NEXT: v_and_b32_e32 v0, 15, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc4:		; GFX8-LABEL: udot8_acc4:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_and_b32 s9, s0, 15		; GFX8-NEXT: s_and_b32 s8, s0, 15
; GFX8-NEXT: s_and_b32 s16, s1, 15		; GFX8-NEXT: s_and_b32 s15, s1, 15
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX8-NEXT: v_and_b32_e32 v3, 15, v3		; GFX8-NEXT: v_and_b32_e32 v3, 15, v3
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v2		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc4:		; GFX9-LABEL: udot8_acc4:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_and_b32 s9, s0, 15		; GFX9-NEXT: s_and_b32 s8, s0, 15
; GFX9-NEXT: s_and_b32 s16, s1, 15		; GFX9-NEXT: s_and_b32 s15, s1, 15
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc4:		; GFX9-DL-LABEL: udot8_acc4:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_and_b32 s9, s0, 15		; GFX9-DL-NEXT: s_and_b32 s8, s0, 15
; GFX9-DL-NEXT: s_and_b32 s16, s1, 15		; GFX9-DL-NEXT: s_and_b32 s15, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc4:		; GFX10-DL-LABEL: udot8_acc4:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40008
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x4000c
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s7, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s6, v2
; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s4, s5		; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s3, s4
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i4 addrspace(1)* nocapture %dst) {		i4 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; TODO: Currently, permutation of udot8 is turned off due to a huge increase		; TODO: Currently, permutation of udot8 is turned off due to a huge increase
; in the compile time.		; in the compile time.
define amdgpu_kernel void @udot8_CommutationInsideMAD(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_CommutationInsideMAD(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_CommutationInsideMAD:		; GFX7-LABEL: udot8_CommutationInsideMAD:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_and_b32_e32 v0, 15, v0		; GFX7-NEXT: v_and_b32_e32 v0, 15, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_CommutationInsideMAD:		; GFX8-LABEL: udot8_CommutationInsideMAD:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_and_b32 s9, s0, 15		; GFX8-NEXT: s_and_b32 s8, s0, 15
; GFX8-NEXT: s_and_b32 s16, s1, 15		; GFX8-NEXT: s_and_b32 s15, s1, 15
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX8-NEXT: v_and_b32_e32 v3, 15, v3		; GFX8-NEXT: v_and_b32_e32 v3, 15, v3
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v3		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v2, v3
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_CommutationInsideMAD:		; GFX9-LABEL: udot8_CommutationInsideMAD:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_and_b32 s9, s0, 15		; GFX9-NEXT: s_and_b32 s8, s0, 15
; GFX9-NEXT: s_and_b32 s16, s1, 15		; GFX9-NEXT: s_and_b32 s15, s1, 15
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: v_add_u32_e32 v2, v3, v2		; GFX9-NEXT: v_add_u32_e32 v2, v3, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_CommutationInsideMAD:		; GFX9-DL-LABEL: udot8_CommutationInsideMAD:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_and_b32 s9, s0, 15		; GFX9-DL-NEXT: s_and_b32 s8, s0, 15
; GFX9-DL-NEXT: s_and_b32 s16, s1, 15		; GFX9-DL-NEXT: s_and_b32 s15, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: v_add_u32_e32 v2, v3, v2		; GFX9-DL-NEXT: v_add_u32_e32 v2, v3, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_CommutationInsideMAD:		; GFX10-DL-LABEL: udot8_CommutationInsideMAD:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x4000c
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s4, s8		; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s3, s7
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s7, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s6, v2
; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i4 addrspace(1)* nocapture %dst) {		i4 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	entry:

store i4 %add8, i4 addrspace(1)* %dst, align 4		store i4 %add8, i4 addrspace(1)* %dst, align 4
ret void		ret void
}		}

define amdgpu_kernel void @udot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_multiuses_mul1:		; GFX7-LABEL: udot8_multiuses_mul1:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s6, 0x40004
; GFX7-NEXT: s_lshr_b32 s11, s10, 28		; GFX7-NEXT: s_lshr_b32 s7, s6, 28
; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s6, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s6, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s6, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s6, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s6, 0x40008
; GFX7-NEXT: s_and_b32 s10, s10, 15		; GFX7-NEXT: s_and_b32 s6, s6, 15
; GFX7-NEXT: s_lshr_b32 s1, s0, 28		; GFX7-NEXT: s_lshr_b32 s5, s4, 28
; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s13, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v0, s10		; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s21		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: v_mad_u32_u24 v1, s0, v0, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s4, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s20
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v0, v1
; GFX7-NEXT: v_mad_u32_u24 v1, s14, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s19		; GFX7-NEXT: v_mov_b32_e32 v2, s19
		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v0, v1
; GFX7-NEXT: v_mad_u32_u24 v1, s13, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s13, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s18		; GFX7-NEXT: v_mov_b32_e32 v2, s18
; GFX7-NEXT: v_mad_u32_u24 v1, s12, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s12, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s17		; GFX7-NEXT: v_mov_b32_e32 v2, s17
; GFX7-NEXT: v_mad_u32_u24 v1, s9, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s11, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s16		; GFX7-NEXT: v_mov_b32_e32 v2, s16
; GFX7-NEXT: v_mad_u32_u24 v1, s8, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s10, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s15		; GFX7-NEXT: v_mov_b32_e32 v2, s15
; GFX7-NEXT: v_mad_u32_u24 v1, s2, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s9, v2, v1
; GFX7-NEXT: v_mov_b32_e32 v2, s11		; GFX7-NEXT: v_mov_b32_e32 v2, s14
; GFX7-NEXT: v_mad_u32_u24 v1, s1, v2, v1		; GFX7-NEXT: v_mad_u32_u24 v1, s8, v2, v1
		; GFX7-NEXT: v_mov_b32_e32 v2, s7
		; GFX7-NEXT: v_mad_u32_u24 v1, s5, v2, v1
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v1, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_multiuses_mul1:		; GFX8-LABEL: udot8_multiuses_mul1:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX8-NEXT: s_lshr_b32 s7, s6, 28		; GFX8-NEXT: s_lshr_b32 s7, s6, 28
; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX8-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX8-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX8-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX8-NEXT: s_and_b32 s6, s6, 15		; GFX8-NEXT: s_and_b32 s6, s6, 15
; GFX8-NEXT: s_lshr_b32 s4, s2, 28		; GFX8-NEXT: s_lshr_b32 s3, s2, 28
; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX8-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX8-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX8-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX8-NEXT: s_and_b32 s2, s2, 15		; GFX8-NEXT: s_and_b32 s2, s2, 15
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s6
; GFX8-NEXT: v_mov_b32_e32 v1, s19		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: v_mad_u32_u24 v1, s2, v0, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s18
; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mad_u32_u24 v1, s12, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s17		; GFX8-NEXT: v_mov_b32_e32 v2, s17
		; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mad_u32_u24 v1, s11, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s11, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s16		; GFX8-NEXT: v_mov_b32_e32 v2, s16
; GFX8-NEXT: v_mad_u32_u24 v1, s10, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s10, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s15		; GFX8-NEXT: v_mov_b32_e32 v2, s15
; GFX8-NEXT: v_mad_u32_u24 v1, s9, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s9, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s14		; GFX8-NEXT: v_mov_b32_e32 v2, s14
; GFX8-NEXT: v_mad_u32_u24 v1, s8, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s8, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s13		; GFX8-NEXT: v_mov_b32_e32 v2, s13
; GFX8-NEXT: v_mad_u32_u24 v1, s5, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s5, v2, v1
; GFX8-NEXT: v_mov_b32_e32 v2, s7		; GFX8-NEXT: v_mov_b32_e32 v2, s12
; GFX8-NEXT: v_mad_u32_u24 v1, s4, v2, v1		; GFX8-NEXT: v_mad_u32_u24 v1, s4, v2, v1
		; GFX8-NEXT: v_mov_b32_e32 v2, s7
		; GFX8-NEXT: v_mad_u32_u24 v1, s3, v2, v1
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v1, v0		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_multiuses_mul1:		; GFX9-LABEL: udot8_multiuses_mul1:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX9-NEXT: s_lshr_b32 s7, s6, 28		; GFX9-NEXT: s_lshr_b32 s7, s6, 28
; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX9-NEXT: s_and_b32 s6, s6, 15		; GFX9-NEXT: s_and_b32 s6, s6, 15
; GFX9-NEXT: s_lshr_b32 s4, s2, 28		; GFX9-NEXT: s_lshr_b32 s3, s2, 28
; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX9-NEXT: s_and_b32 s2, s2, 15		; GFX9-NEXT: s_and_b32 s2, s2, 15
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s6
; GFX9-NEXT: v_mov_b32_e32 v1, s19		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: v_mad_u32_u24 v1, s2, v0, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s18
; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mad_u32_u24 v1, s12, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s17		; GFX9-NEXT: v_mov_b32_e32 v2, s17
		; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mad_u32_u24 v1, s11, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s11, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s16		; GFX9-NEXT: v_mov_b32_e32 v2, s16
; GFX9-NEXT: v_mad_u32_u24 v1, s10, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s10, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s15		; GFX9-NEXT: v_mov_b32_e32 v2, s15
; GFX9-NEXT: v_mad_u32_u24 v1, s9, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s9, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s14		; GFX9-NEXT: v_mov_b32_e32 v2, s14
; GFX9-NEXT: v_mad_u32_u24 v1, s8, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s8, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s13		; GFX9-NEXT: v_mov_b32_e32 v2, s13
; GFX9-NEXT: v_mad_u32_u24 v1, s5, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s5, v2, v1
; GFX9-NEXT: v_mov_b32_e32 v2, s7		; GFX9-NEXT: v_mov_b32_e32 v2, s12
; GFX9-NEXT: v_mad_u32_u24 v1, s4, v2, v1		; GFX9-NEXT: v_mad_u32_u24 v1, s4, v2, v1
		; GFX9-NEXT: v_mov_b32_e32 v2, s7
		; GFX9-NEXT: v_mad_u32_u24 v1, s3, v2, v1
; GFX9-NEXT: v_add_u32_e32 v2, v0, v1		; GFX9-NEXT: v_add_u32_e32 v2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_multiuses_mul1:		; GFX9-DL-LABEL: udot8_multiuses_mul1:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX9-DL-NEXT: s_lshr_b32 s7, s6, 28		; GFX9-DL-NEXT: s_lshr_b32 s7, s6, 28
; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX9-DL-NEXT: s_and_b32 s6, s6, 15		; GFX9-DL-NEXT: s_and_b32 s6, s6, 15
; GFX9-DL-NEXT: s_lshr_b32 s4, s2, 28		; GFX9-DL-NEXT: s_lshr_b32 s3, s2, 28
; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX9-DL-NEXT: s_and_b32 s2, s2, 15		; GFX9-DL-NEXT: s_and_b32 s2, s2, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s19		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s18
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s2, v0, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s2, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s18
; GFX9-DL-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s12, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s17		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s17
		; GFX9-DL-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s11, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s11, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s16
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s10, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s10, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s15
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s9, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s9, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s14
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s8, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s8, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s13
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s5, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s5, v2, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s7		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s12
; GFX9-DL-NEXT: v_mad_u32_u24 v1, s4, v2, v1		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s4, v2, v1
		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s7
		; GFX9-DL-NEXT: v_mad_u32_u24 v1, s3, v2, v1
; GFX9-DL-NEXT: v_add_u32_e32 v2, v0, v1		; GFX9-DL-NEXT: v_add_u32_e32 v2, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_multiuses_mul1:		; GFX10-DL-LABEL: udot8_multiuses_mul1:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s4, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
; GFX10-DL-NEXT: s_load_dword s5, s[0:1], 0x0		; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s6, s2, 15		; GFX10-DL-NEXT: s_and_b32 s5, s2, 15
; GFX10-DL-NEXT: s_and_b32 s7, s4, 15		; GFX10-DL-NEXT: s_and_b32 s6, s3, 15
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: s_bfe_u32 s5, s2, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s2, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s8, s4, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s7, s3, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s9, s2, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s8, s2, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s10, s4, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s9, s3, 0x40008
; GFX10-DL-NEXT: v_mad_u32_u24 v0, s6, s7, v0		; GFX10-DL-NEXT: v_mad_u32_u24 v0, s5, s6, v0
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s5, s8, v0		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s4, s7, v0
; GFX10-DL-NEXT: s_bfe_u32 s5, s2, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s2, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s8, s4, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s7, s3, 0x4000c
; GFX10-DL-NEXT: v_mad_u32_u24 v0, s6, s7, v0		; GFX10-DL-NEXT: v_mad_u32_u24 v0, s5, s6, v0
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s9, s10, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s8, s9, v1
; GFX10-DL-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s10, s4, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s9, s3, 0x40010
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s4, s7, v1
; GFX10-DL-NEXT: s_bfe_u32 s5, s2, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s2, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s8, s4, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s7, s3, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s9, s10, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s8, s9, v1
; GFX10-DL-NEXT: s_bfe_u32 s9, s2, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s8, s2, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s10, s4, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s9, s3, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s2, s2, 28		; GFX10-DL-NEXT: s_lshr_b32 s2, s2, 28
; GFX10-DL-NEXT: s_lshr_b32 s4, s4, 28		; GFX10-DL-NEXT: s_lshr_b32 s3, s3, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s5, s8, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s4, s7, v1
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s9, s10, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s8, s9, v1
; GFX10-DL-NEXT: v_mad_u32_u24 v1, s2, s4, v1		; GFX10-DL-NEXT: v_mad_u32_u24 v1, s2, s3, v1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v0, v1		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v0, v1
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	entry:
%res = add i32 %add, %add8		%res = add i32 %add, %add8
store i32 %res, i32 addrspace(1)* %dst, align 4		store i32 %res, i32 addrspace(1)* %dst, align 4
ret void		ret void
}		}

define amdgpu_kernel void @udot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc32_vecMul:		; GFX7-LABEL: udot8_acc32_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0		; GFX7-NEXT: s_load_dword s20, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s11, s10, 28		; GFX7-NEXT: s_lshr_b32 s7, s6, 28
; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s6, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s6, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s6, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s6, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s6, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s6, 0x40004
; GFX7-NEXT: s_and_b32 s10, s10, 15		; GFX7-NEXT: s_and_b32 s6, s6, 15
; GFX7-NEXT: s_lshr_b32 s1, s0, 28		; GFX7-NEXT: s_lshr_b32 s5, s4, 28
; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s13, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v0, s10		; GFX7-NEXT: v_mov_b32_e32 v0, s6
; GFX7-NEXT: v_mov_b32_e32 v1, s21
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s20		; GFX7-NEXT: v_mov_b32_e32 v1, s20
; GFX7-NEXT: v_mad_u32_u24 v0, s14, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v0, v1
; GFX7-NEXT: v_mov_b32_e32 v1, s19		; GFX7-NEXT: v_mov_b32_e32 v1, s19
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s13, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s18		; GFX7-NEXT: v_mov_b32_e32 v1, s18
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s17		; GFX7-NEXT: v_mov_b32_e32 v1, s17
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s16		; GFX7-NEXT: v_mov_b32_e32 v1, s16
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s15		; GFX7-NEXT: v_mov_b32_e32 v1, s15
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s11		; GFX7-NEXT: v_mov_b32_e32 v1, s14
; GFX7-NEXT: v_mad_u32_u24 v0, s1, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0		; GFX7-NEXT: v_mov_b32_e32 v1, s7
		; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc32_vecMul:		; GFX8-LABEL: udot8_acc32_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX8-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s7, s6, 28		; GFX8-NEXT: s_lshr_b32 s7, s6, 28
; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX8-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX8-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX8-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX8-NEXT: s_and_b32 s6, s6, 15		; GFX8-NEXT: s_and_b32 s6, s6, 15
; GFX8-NEXT: s_lshr_b32 s4, s2, 28		; GFX8-NEXT: s_lshr_b32 s3, s2, 28
; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX8-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX8-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX8-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX8-NEXT: s_and_b32 s2, s2, 15		; GFX8-NEXT: s_and_b32 s2, s2, 15
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s6
; GFX8-NEXT: v_mov_b32_e32 v1, s19
; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s18		; GFX8-NEXT: v_mov_b32_e32 v1, s18
; GFX8-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX8-NEXT: v_mov_b32_e32 v1, s17		; GFX8-NEXT: v_mov_b32_e32 v1, s17
; GFX8-NEXT: v_mad_u32_u24 v0, s11, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s16		; GFX8-NEXT: v_mov_b32_e32 v1, s16
; GFX8-NEXT: v_mad_u32_u24 v0, s10, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s15		; GFX8-NEXT: v_mov_b32_e32 v1, s15
; GFX8-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s14		; GFX8-NEXT: v_mov_b32_e32 v1, s14
; GFX8-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s13		; GFX8-NEXT: v_mov_b32_e32 v1, s13
; GFX8-NEXT: v_mad_u32_u24 v0, s5, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX8-NEXT: v_mov_b32_e32 v1, s12
		; GFX8-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v1, s7		; GFX8-NEXT: v_mov_b32_e32 v1, s7
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v1, v0		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc32_vecMul:		; GFX9-LABEL: udot8_acc32_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0		; GFX9-NEXT: s_load_dword s18, s[0:1], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshr_b32 s7, s6, 28		; GFX9-NEXT: s_lshr_b32 s7, s6, 28
; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s12, s6, 0x40018
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40014
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x4000c
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008
; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40004
; GFX9-NEXT: s_and_b32 s6, s6, 15		; GFX9-NEXT: s_and_b32 s6, s6, 15
; GFX9-NEXT: s_lshr_b32 s4, s2, 28		; GFX9-NEXT: s_lshr_b32 s3, s2, 28
; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x4000c
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40004
; GFX9-NEXT: s_and_b32 s2, s2, 15		; GFX9-NEXT: s_and_b32 s2, s2, 15
; GFX9-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NEXT: v_mov_b32_e32 v0, s6
; GFX9-NEXT: v_mov_b32_e32 v1, s19
; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s18		; GFX9-NEXT: v_mov_b32_e32 v1, s18
; GFX9-NEXT: v_mad_u32_u24 v0, s12, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s2, v0, v1
; GFX9-NEXT: v_mov_b32_e32 v1, s17		; GFX9-NEXT: v_mov_b32_e32 v1, s17
; GFX9-NEXT: v_mad_u32_u24 v0, s11, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s11, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s16		; GFX9-NEXT: v_mov_b32_e32 v1, s16
; GFX9-NEXT: v_mad_u32_u24 v0, s10, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s10, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s15		; GFX9-NEXT: v_mov_b32_e32 v1, s15
; GFX9-NEXT: v_mad_u32_u24 v0, s9, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s9, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s14		; GFX9-NEXT: v_mov_b32_e32 v1, s14
; GFX9-NEXT: v_mad_u32_u24 v0, s8, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s8, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s13		; GFX9-NEXT: v_mov_b32_e32 v1, s13
; GFX9-NEXT: v_mad_u32_u24 v0, s5, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v0, s5, v1, v0
		; GFX9-NEXT: v_mov_b32_e32 v1, s12
		; GFX9-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v1, s7		; GFX9-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v1, v0		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v1, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc32_vecMul:		; GFX9-DL-LABEL: udot8_acc32_vecMul:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s6, s[0:1], 0x0		; GFX9-DL-NEXT: s_load_dword s3, s[0:1], 0x0
; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1		; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc32_vecMul:		; GFX10-DL-LABEL: udot8_acc32_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX10-DL-NEXT: v_dot8_u32_u4 v2, s1, s2, v0		; GFX10-DL-NEXT: v_dot8_u32_u4 v2, s0, s1, v0
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s8		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s9		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5
; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX10-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i32 addrspace(1)* nocapture %dst) {		i32 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2		%vec2 = load <8 x i4>, <8 x i4> addrspace(1)* %src2

Show All 24 Lines	entry:
ret void		ret void
}		}

; TODO: Clean up the code(by default pk_mad_I16 should be generated), then		; TODO: Clean up the code(by default pk_mad_I16 should be generated), then
; support the pattern.		; support the pattern.
define amdgpu_kernel void @udot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc16_vecMul:		; GFX7-LABEL: udot8_acc16_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x40004
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x40004
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_and_b32 s19, s1, 15		; GFX7-NEXT: s_and_b32 s18, s5, 15
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_bfe_u32 s1, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s5, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x4000c
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mul_u32_u24_e32 v2, s13, v2		; GFX7-NEXT: v_mul_u32_u24_e32 v2, s12, v2
; GFX7-NEXT: v_mul_u32_u24_e32 v4, s11, v4		; GFX7-NEXT: v_mul_u32_u24_e32 v4, s10, v4
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_and_b32 s12, s0, 15		; GFX7-NEXT: s_and_b32 s11, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: s_bfe_u32 s0, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s4, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mul_u32_u24_e32 v1, s0, v1		; GFX7-NEXT: v_mul_u32_u24_e32 v1, s4, v1
; GFX7-NEXT: v_lshlrev_b32_e32 v2, 16, v2		; GFX7-NEXT: v_lshlrev_b32_e32 v2, 16, v2
; GFX7-NEXT: v_mul_u32_u24_e32 v3, s12, v3		; GFX7-NEXT: v_mul_u32_u24_e32 v3, s11, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v4, 16, v4		; GFX7-NEXT: v_lshlrev_b32_e32 v4, 16, v4
; GFX7-NEXT: v_or_b32_e32 v1, v1, v2		; GFX7-NEXT: v_or_b32_e32 v1, v1, v2
; GFX7-NEXT: v_or_b32_e32 v2, v3, v4		; GFX7-NEXT: v_or_b32_e32 v2, v3, v4
; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 16		; GFX7-NEXT: v_alignbit_b32 v3, v1, v2, 16
; GFX7-NEXT: v_lshrrev_b32_e32 v4, 16, v1		; GFX7-NEXT: v_lshrrev_b32_e32 v4, 16, v1
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v3, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v1
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc16_vecMul:		; GFX8-LABEL: udot8_acc16_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ushort v2, v[0:1]		; GFX8-NEXT: flat_load_ushort v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008
; GFX8-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_and_b32 s1, s1, 15		; GFX8-NEXT: s_and_b32 s1, s1, 15
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX8-NEXT: s_and_b32 s0, s0, 15		; GFX8-NEXT: s_and_b32 s0, s0, 15
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s0, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX8-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: flat_store_short v[0:1], v2		; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc16_vecMul:		; GFX9-LABEL: udot8_acc16_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_u32 s7, s6, 0x40018		; GFX9-NEXT: s_bfe_u32 s7, s6, 0x40018
; GFX9-NEXT: s_lshr_b32 s13, s6, 28		; GFX9-NEXT: s_lshr_b32 s12, s6, 28
; GFX9-NEXT: s_pack_ll_b32_b16 s7, s7, s13		; GFX9-NEXT: s_pack_ll_b32_b16 s7, s7, s12
; GFX9-NEXT: s_bfe_u32 s4, s2, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s2, 0x40018
; GFX9-NEXT: s_lshr_b32 s5, s2, 28		; GFX9-NEXT: s_lshr_b32 s4, s2, 28
; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40010		; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40010
; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40014		; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014
; GFX9-NEXT: s_pack_ll_b32_b16 s4, s4, s5		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s3, s4
; GFX9-NEXT: v_mov_b32_e32 v0, s7		; GFX9-NEXT: v_mov_b32_e32 v0, s7
; GFX9-NEXT: v_pk_mul_lo_u16 v2, s4, v0		; GFX9-NEXT: v_pk_mul_lo_u16 v2, s3, v0
; GFX9-NEXT: s_pack_ll_b32_b16 s4, s14, s15		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s13, s14
; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40010
; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40014		; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014
; GFX9-NEXT: s_bfe_u32 s16, s6, 0x40008		; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40008
; GFX9-NEXT: s_bfe_u32 s17, s6, 0x4000c		; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c
; GFX9-NEXT: s_and_b32 s18, s6, 15		; GFX9-NEXT: s_and_b32 s17, s6, 15
; GFX9-NEXT: v_mov_b32_e32 v0, s4		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: s_pack_ll_b32_b16 s5, s8, s9		; GFX9-NEXT: s_pack_ll_b32_b16 s4, s5, s8
; GFX9-NEXT: s_bfe_u32 s10, s2, 0x40008		; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40008
; GFX9-NEXT: s_bfe_u32 s11, s2, 0x4000c		; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c
; GFX9-NEXT: s_bfe_u32 s6, s6, 0x40004		; GFX9-NEXT: s_bfe_u32 s6, s6, 0x40004
; GFX9-NEXT: s_pack_ll_b32_b16 s4, s16, s17		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s15, s16
; GFX9-NEXT: v_pk_mul_lo_u16 v3, s5, v0		; GFX9-NEXT: v_pk_mul_lo_u16 v3, s4, v0
; GFX9-NEXT: s_and_b32 s12, s2, 15		; GFX9-NEXT: s_and_b32 s11, s2, 15
; GFX9-NEXT: s_bfe_u32 s2, s2, 0x40004		; GFX9-NEXT: s_bfe_u32 s2, s2, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v0, s4		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: s_pack_ll_b32_b16 s5, s10, s11		; GFX9-NEXT: s_pack_ll_b32_b16 s4, s9, s10
; GFX9-NEXT: s_pack_ll_b32_b16 s4, s18, s6		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s17, s6
; GFX9-NEXT: v_pk_mul_lo_u16 v4, s5, v0		; GFX9-NEXT: v_pk_mul_lo_u16 v4, s4, v0
; GFX9-NEXT: s_pack_ll_b32_b16 s2, s12, s2		; GFX9-NEXT: s_pack_ll_b32_b16 s2, s11, s2
; GFX9-NEXT: v_mov_b32_e32 v0, s4		; GFX9-NEXT: v_mov_b32_e32 v0, s3
; GFX9-NEXT: v_pk_mul_lo_u16 v5, s2, v0		; GFX9-NEXT: v_pk_mul_lo_u16 v5, s2, v0
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ushort v6, v[0:1], off		; GFX9-NEXT: global_load_ushort v6, v[0:1], off
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_add_u32_e32 v6, v5, v6		; GFX9-NEXT: v_add_u32_e32 v6, v5, v6
; GFX9-NEXT: v_add_u32_sdwa v5, v6, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-NEXT: v_add_u32_sdwa v5, v6, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-NEXT: v_add_u32_sdwa v5, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0		; GFX9-NEXT: v_add_u32_sdwa v5, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0
Show All 9 Lines
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0
; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_u32 s7, s6, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s7, s6, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s13, s6, 28		; GFX9-DL-NEXT: s_lshr_b32 s12, s6, 28
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s7, s7, s13		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s7, s7, s12
; GFX9-DL-NEXT: s_bfe_u32 s4, s2, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s2, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s5, s2, 28		; GFX9-DL-NEXT: s_lshr_b32 s4, s2, 28
; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40014
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s4, s5		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s3, s3, s4
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s7		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s7
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v2, s4, v0		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v2, s3, v0
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s14, s15		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s3, s13, s14
; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x4000c
; GFX9-DL-NEXT: s_and_b32 s18, s6, 15		; GFX9-DL-NEXT: s_and_b32 s17, s6, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s3
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s5, s8, s9		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s5, s8
; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s6, s6, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s6, s6, 0x40004
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s16, s17		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s3, s15, s16
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v3, s5, v0		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v3, s4, v0
; GFX9-DL-NEXT: s_and_b32 s12, s2, 15		; GFX9-DL-NEXT: s_and_b32 s11, s2, 15
; GFX9-DL-NEXT: s_bfe_u32 s2, s2, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s2, s2, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s3
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s5, s10, s11		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s9, s10
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s4, s18, s6		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s3, s17, s6
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v4, s5, v0		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v4, s4, v0
; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s12, s2		; GFX9-DL-NEXT: s_pack_ll_b32_b16 s2, s11, s2
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s3
; GFX9-DL-NEXT: v_pk_mul_lo_u16 v5, s2, v0		; GFX9-DL-NEXT: v_pk_mul_lo_u16 v5, s2, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ushort v6, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v6, v[0:1], off
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_add_u32_e32 v6, v5, v6		; GFX9-DL-NEXT: v_add_u32_e32 v6, v5, v6
; GFX9-DL-NEXT: v_add_u32_sdwa v5, v6, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v5, v6, v5 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_sdwa v5, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0		; GFX9-DL-NEXT: v_add_u32_sdwa v5, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0
; GFX9-DL-NEXT: v_add_u32_sdwa v4, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v4, v5, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_e32 v4, v4, v3		; GFX9-DL-NEXT: v_add_u32_e32 v4, v4, v3
; GFX9-DL-NEXT: v_add_u32_sdwa v3, v4, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v3, v4, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_e32 v3, v3, v2		; GFX9-DL-NEXT: v_add_u32_e32 v3, v3, v2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: global_store_short v[0:1], v2, off		; GFX9-DL-NEXT: global_store_short v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc16_vecMul:		; GFX10-DL-LABEL: udot8_acc16_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s8, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s7, s0, 0x4000c
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s6		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s5
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40008
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s4, s5		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s3, s3, s4
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40008
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, s2, s4		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, s2, s3
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s6, s7		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s3, s5, s6
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s5, s8		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s4, s7
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40014
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, s5, s4		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, s4, s3
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s6		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s5
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s7, s8		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s4, s6, s7
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s4, s0		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s0, s3, s0
; GFX10-DL-NEXT: s_pack_ll_b32_b16 s1, s6, s1		; GFX10-DL-NEXT: s_pack_ll_b32_b16 s1, s5, s1
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, s2, s5		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v3, s2, s4
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:BYTE_0
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, s0, s1		; GFX10-DL-NEXT: v_pk_mul_lo_u16 v4, s0, s1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v3 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v4		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v4
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: global_store_short v[0:1], v2, off		; GFX10-DL-NEXT: global_store_short v[0:1], v2, off
Show All 30 Lines	entry:
store i16 %add8, i16 addrspace(1)* %dst, align 4		store i16 %add8, i16 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Cleanup the code to generate MAD; pattern should be recognized then.		; TODO: Cleanup the code to generate MAD; pattern should be recognized then.
define amdgpu_kernel void @udot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc8_vecMul:		; GFX7-LABEL: udot8_acc8_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_bfe_u32 s2, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s6, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s14, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s13, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s18, s1, 28		; GFX7-NEXT: s_lshr_b32 s17, s5, 28
; GFX7-NEXT: v_mov_b32_e32 v8, s14		; GFX7-NEXT: v_mov_b32_e32 v8, s13
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40008
; GFX7-NEXT: s_and_b32 s17, s1, 15		; GFX7-NEXT: s_and_b32 s16, s5, 15
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40004
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: s_lshr_b32 s11, s0, 28		; GFX7-NEXT: s_lshr_b32 s10, s4, 28
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mul_u32_u24_e32 v4, s11, v4		; GFX7-NEXT: v_mul_u32_u24_e32 v4, s10, v4
; GFX7-NEXT: v_mul_u32_u24_e32 v6, s9, v6		; GFX7-NEXT: v_mul_u32_u24_e32 v6, s8, v6
; GFX7-NEXT: v_mul_u32_u24_e32 v8, s2, v8		; GFX7-NEXT: v_mul_u32_u24_e32 v8, s6, v8
; GFX7-NEXT: s_bfe_u32 s1, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s5, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40008
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_and_b32 s10, s0, 15		; GFX7-NEXT: s_and_b32 s9, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40018
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40014
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mul_u32_u24_e32 v2, s13, v2		; GFX7-NEXT: v_mul_u32_u24_e32 v2, s12, v2
; GFX7-NEXT: s_bfe_u32 s0, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s4, s4, 0x40010
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mul_u32_u24_e32 v3, s12, v3		; GFX7-NEXT: v_mul_u32_u24_e32 v3, s11, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v4, 8, v4		; GFX7-NEXT: v_lshlrev_b32_e32 v4, 8, v4
; GFX7-NEXT: v_mul_u32_u24_e32 v5, s10, v5		; GFX7-NEXT: v_mul_u32_u24_e32 v5, s9, v5
; GFX7-NEXT: v_mul_u32_u24_e32 v7, s8, v7		; GFX7-NEXT: v_mul_u32_u24_e32 v7, s7, v7
; GFX7-NEXT: v_lshlrev_b32_e32 v6, 8, v6		; GFX7-NEXT: v_lshlrev_b32_e32 v6, 8, v6
; GFX7-NEXT: v_lshlrev_b32_e32 v8, 8, v8		; GFX7-NEXT: v_lshlrev_b32_e32 v8, 8, v8
; GFX7-NEXT: v_or_b32_e32 v3, v3, v4		; GFX7-NEXT: v_or_b32_e32 v3, v3, v4
; GFX7-NEXT: v_or_b32_e32 v4, v5, v6		; GFX7-NEXT: v_or_b32_e32 v4, v5, v6
; GFX7-NEXT: v_or_b32_e32 v5, v7, v8		; GFX7-NEXT: v_or_b32_e32 v5, v7, v8
; GFX7-NEXT: v_mul_u32_u24_e32 v9, s0, v1		; GFX7-NEXT: v_mul_u32_u24_e32 v9, s4, v1
; GFX7-NEXT: v_lshlrev_b32_e32 v2, 8, v2		; GFX7-NEXT: v_lshlrev_b32_e32 v2, 8, v2
; GFX7-NEXT: v_or_b32_e32 v2, v9, v2		; GFX7-NEXT: v_or_b32_e32 v2, v9, v2
; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3		; GFX7-NEXT: v_lshlrev_b32_e32 v3, 16, v3
; GFX7-NEXT: v_lshlrev_b32_e32 v5, 16, v5		; GFX7-NEXT: v_lshlrev_b32_e32 v5, 16, v5
; GFX7-NEXT: v_or_b32_e32 v2, v2, v3		; GFX7-NEXT: v_or_b32_e32 v2, v2, v3
; GFX7-NEXT: v_or_b32_e32 v3, v4, v5		; GFX7-NEXT: v_or_b32_e32 v3, v4, v5
; GFX7-NEXT: v_alignbit_b32 v4, v2, v3, 8		; GFX7-NEXT: v_alignbit_b32 v4, v2, v3, 8
; GFX7-NEXT: v_alignbit_b32 v5, v2, v3, 16		; GFX7-NEXT: v_alignbit_b32 v5, v2, v3, 16
; GFX7-NEXT: v_lshrrev_b32_e32 v6, 24, v3		; GFX7-NEXT: v_lshrrev_b32_e32 v6, 24, v3
; GFX7-NEXT: v_lshrrev_b32_e32 v7, 8, v2		; GFX7-NEXT: v_lshrrev_b32_e32 v7, 8, v2
; GFX7-NEXT: v_lshrrev_b32_e32 v8, 16, v2		; GFX7-NEXT: v_lshrrev_b32_e32 v8, 16, v2
; GFX7-NEXT: v_lshrrev_b32_e32 v2, 24, v2		; GFX7-NEXT: v_lshrrev_b32_e32 v2, 24, v2
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v3		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v3
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v4, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v5, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v5, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v6, v0		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v7		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v7
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v8		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v8
; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2		; GFX7-NEXT: v_add_i32_e32 v0, vcc, v0, v2
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc8_vecMul:		; GFX8-LABEL: udot8_acc8_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s1, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s2, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s2, s[6:7], 0x0
; GFX8-NEXT: s_mov_b32 s0, 0xffff		; GFX8-NEXT: s_mov_b32 s0, 0xffff
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_bfe_u32 s8, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s7, s1, 0x40004
; GFX8-NEXT: s_bfe_u32 s10, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s9, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s15, s2, 0x40004		; GFX8-NEXT: s_bfe_u32 s14, s2, 0x40004
; GFX8-NEXT: s_and_b32 s16, s2, 15		; GFX8-NEXT: s_and_b32 s15, s2, 15
; GFX8-NEXT: s_bfe_u32 s17, s2, 0x4000c		; GFX8-NEXT: s_bfe_u32 s16, s2, 0x4000c
; GFX8-NEXT: s_bfe_u32 s4, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s3, s1, 0x40014
; GFX8-NEXT: s_lshr_b32 s6, s1, 28		; GFX8-NEXT: s_lshr_b32 s5, s1, 28
; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40014		; GFX8-NEXT: s_bfe_u32 s10, s2, 0x40014
; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40010		; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40010
; GFX8-NEXT: s_lshr_b32 s13, s2, 28		; GFX8-NEXT: s_lshr_b32 s12, s2, 28
; GFX8-NEXT: s_bfe_u32 s14, s2, 0x40018		; GFX8-NEXT: s_bfe_u32 s13, s2, 0x40018
; GFX8-NEXT: s_bfe_u32 s2, s2, 0x40008		; GFX8-NEXT: s_bfe_u32 s2, s2, 0x40008
; GFX8-NEXT: s_and_b32 s9, s1, 15		; GFX8-NEXT: s_and_b32 s8, s1, 15
; GFX8-NEXT: v_mov_b32_e32 v4, s17		; GFX8-NEXT: v_mov_b32_e32 v4, s16
; GFX8-NEXT: v_mov_b32_e32 v5, s10		; GFX8-NEXT: v_mov_b32_e32 v5, s9
; GFX8-NEXT: v_mov_b32_e32 v6, s16		; GFX8-NEXT: v_mov_b32_e32 v6, s15
; GFX8-NEXT: v_mov_b32_e32 v7, s15		; GFX8-NEXT: v_mov_b32_e32 v7, s14
; GFX8-NEXT: v_mov_b32_e32 v8, s8		; GFX8-NEXT: v_mov_b32_e32 v8, s7
; GFX8-NEXT: v_mul_u32_u24_sdwa v4, v5, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_u32_u24_sdwa v4, v5, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_mul_u32_u24_e32 v5, s9, v6		; GFX8-NEXT: v_mul_u32_u24_e32 v5, s8, v6
; GFX8-NEXT: v_mul_u32_u24_sdwa v6, v8, v7 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_u32_u24_sdwa v6, v8, v7 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: s_bfe_u32 s5, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s4, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s6, s1, 0x40018
; GFX8-NEXT: v_mov_b32_e32 v9, s14		; GFX8-NEXT: v_mov_b32_e32 v9, s13
; GFX8-NEXT: s_bfe_u32 s1, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s1, s1, 0x40008
; GFX8-NEXT: v_mov_b32_e32 v3, s2		; GFX8-NEXT: v_mov_b32_e32 v3, s2
; GFX8-NEXT: v_mov_b32_e32 v10, s13		; GFX8-NEXT: v_mov_b32_e32 v10, s12
; GFX8-NEXT: v_mov_b32_e32 v11, s6		; GFX8-NEXT: v_mov_b32_e32 v11, s5
; GFX8-NEXT: v_mov_b32_e32 v12, s12		; GFX8-NEXT: v_mov_b32_e32 v12, s11
; GFX8-NEXT: v_mov_b32_e32 v13, s11		; GFX8-NEXT: v_mov_b32_e32 v13, s10
; GFX8-NEXT: v_mov_b32_e32 v14, s4		; GFX8-NEXT: v_mov_b32_e32 v14, s3
; GFX8-NEXT: v_mul_u32_u24_e32 v3, s1, v3		; GFX8-NEXT: v_mul_u32_u24_e32 v3, s1, v3
; GFX8-NEXT: v_or_b32_e32 v5, v5, v6		; GFX8-NEXT: v_or_b32_e32 v5, v5, v6
; GFX8-NEXT: v_mul_u32_u24_e32 v7, s7, v9		; GFX8-NEXT: v_mul_u32_u24_e32 v7, s6, v9
; GFX8-NEXT: v_mul_u32_u24_sdwa v8, v11, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_u32_u24_sdwa v8, v11, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_mul_u32_u24_e32 v9, s5, v12		; GFX8-NEXT: v_mul_u32_u24_e32 v9, s4, v12
; GFX8-NEXT: v_mul_u32_u24_sdwa v10, v14, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_mul_u32_u24_sdwa v10, v14, v13 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_and_b32_e32 v5, s0, v5		; GFX8-NEXT: v_and_b32_e32 v5, s0, v5
; GFX8-NEXT: v_or_b32_sdwa v3, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v3, v3, v4 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_or_b32_e32 v9, v9, v10		; GFX8-NEXT: v_or_b32_e32 v9, v9, v10
; GFX8-NEXT: v_or_b32_sdwa v7, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX8-NEXT: v_or_b32_sdwa v7, v7, v8 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX8-NEXT: v_and_b32_e32 v4, s0, v9		; GFX8-NEXT: v_and_b32_e32 v4, s0, v9
; GFX8-NEXT: v_or_b32_e32 v3, v5, v3		; GFX8-NEXT: v_or_b32_e32 v3, v5, v3
; GFX8-NEXT: v_or_b32_e32 v6, v4, v7		; GFX8-NEXT: v_or_b32_e32 v6, v4, v7
Show All 18 Lines
; GFX9-NEXT: s_mov_b32 s2, 0xffff		; GFX9-NEXT: s_mov_b32 s2, 0xffff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40018
; GFX9-NEXT: s_lshr_b32 s14, s1, 28		; GFX9-NEXT: s_lshr_b32 s13, s1, 28
; GFX9-NEXT: s_and_b32 s15, s1, 15		; GFX9-NEXT: s_and_b32 s14, s1, 15
; GFX9-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-NEXT: s_bfe_u32 s17, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s16, s1, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v3, s11		; GFX9-NEXT: v_mov_b32_e32 v3, s10
; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: v_mov_b32_e32 v4, s12		; GFX9-NEXT: v_mov_b32_e32 v4, s11
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40018
; GFX9-NEXT: v_mov_b32_e32 v5, s13		; GFX9-NEXT: v_mov_b32_e32 v5, s12
; GFX9-NEXT: s_lshr_b32 s7, s0, 28		; GFX9-NEXT: s_lshr_b32 s6, s0, 28
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: s_and_b32 s8, s0, 15		; GFX9-NEXT: s_and_b32 s7, s0, 15
; GFX9-NEXT: v_mov_b32_e32 v7, s15		; GFX9-NEXT: v_mov_b32_e32 v7, s14
; GFX9-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v8, s16		; GFX9-NEXT: v_mov_b32_e32 v8, s15
; GFX9-NEXT: s_bfe_u32 s10, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s9, s0, 0x40008
; GFX9-NEXT: v_mov_b32_e32 v9, s17		; GFX9-NEXT: v_mov_b32_e32 v9, s16
; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v10, s1		; GFX9-NEXT: v_mov_b32_e32 v10, s1
; GFX9-NEXT: v_mul_lo_u16_e32 v3, s4, v3		; GFX9-NEXT: v_mul_lo_u16_e32 v3, s3, v3
; GFX9-NEXT: v_mul_lo_u16_sdwa v4, s5, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v4, s4, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_mul_lo_u16_e32 v5, s6, v5		; GFX9-NEXT: v_mul_lo_u16_e32 v5, s5, v5
; GFX9-NEXT: v_mul_lo_u16_sdwa v6, s7, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v6, s6, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_mul_lo_u16_e32 v7, s8, v7		; GFX9-NEXT: v_mul_lo_u16_e32 v7, s7, v7
; GFX9-NEXT: v_mul_lo_u16_sdwa v8, s9, v8 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v8, s8, v8 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_or_b32_e32 v3, v3, v4		; GFX9-NEXT: v_or_b32_e32 v3, v3, v4
; GFX9-NEXT: v_or_b32_sdwa v4, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_or_b32_sdwa v4, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_or_b32_e32 v5, v7, v8		; GFX9-NEXT: v_or_b32_e32 v5, v7, v8
; GFX9-NEXT: v_mul_lo_u16_e32 v9, s10, v9		; GFX9-NEXT: v_mul_lo_u16_e32 v9, s9, v9
; GFX9-NEXT: v_mul_lo_u16_sdwa v10, s0, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_mul_lo_u16_sdwa v10, s0, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_and_b32_e32 v5, s2, v5		; GFX9-NEXT: v_and_b32_e32 v5, s2, v5
; GFX9-NEXT: v_or_b32_sdwa v6, v9, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-NEXT: v_or_b32_sdwa v6, v9, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-NEXT: v_or_b32_e32 v6, v5, v6		; GFX9-NEXT: v_or_b32_e32 v6, v5, v6
; GFX9-NEXT: v_lshrrev_b32_e32 v7, 8, v6		; GFX9-NEXT: v_lshrrev_b32_e32 v7, 8, v6
; GFX9-NEXT: v_and_b32_e32 v3, s2, v3		; GFX9-NEXT: v_and_b32_e32 v3, s2, v3
; GFX9-NEXT: v_or_b32_e32 v4, v3, v4		; GFX9-NEXT: v_or_b32_e32 v4, v3, v4
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
Show All 16 Lines
; GFX9-DL-NEXT: s_mov_b32 s2, 0xffff		; GFX9-DL-NEXT: s_mov_b32 s2, 0xffff
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40018
; GFX9-DL-NEXT: s_lshr_b32 s14, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s13, s1, 28
; GFX9-DL-NEXT: s_and_b32 s15, s1, 15		; GFX9-DL-NEXT: s_and_b32 s14, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s16, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004
; GFX9-DL-NEXT: s_bfe_u32 s17, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s16, s1, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10
; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s11
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40018
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s12
; GFX9-DL-NEXT: s_lshr_b32 s7, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s6, s0, 28
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: s_and_b32 s8, s0, 15		; GFX9-DL-NEXT: s_and_b32 s7, s0, 15
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s14
; GFX9-DL-NEXT: s_bfe_u32 s9, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s15
; GFX9-DL-NEXT: s_bfe_u32 s10, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s9, s0, 0x40008
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s17		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s16
; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v10, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v10, s1
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v3, s4, v3		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v3, s3, v3
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v4, s5, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v4, s4, v4 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v5, s6, v5		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v5, s5, v5
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v6, s7, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v6, s6, v6 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v7, s8, v7		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v7, s7, v7
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v8, s9, v8 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v8, s8, v8 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_or_b32_e32 v3, v3, v4		; GFX9-DL-NEXT: v_or_b32_e32 v3, v3, v4
; GFX9-DL-NEXT: v_or_b32_sdwa v4, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_or_b32_sdwa v4, v5, v6 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_or_b32_e32 v5, v7, v8		; GFX9-DL-NEXT: v_or_b32_e32 v5, v7, v8
; GFX9-DL-NEXT: v_mul_lo_u16_e32 v9, s10, v9		; GFX9-DL-NEXT: v_mul_lo_u16_e32 v9, s9, v9
; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v10, s0, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_mul_lo_u16_sdwa v10, s0, v10 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_and_b32_e32 v5, s2, v5		; GFX9-DL-NEXT: v_and_b32_e32 v5, s2, v5
; GFX9-DL-NEXT: v_or_b32_sdwa v6, v9, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX9-DL-NEXT: v_or_b32_sdwa v6, v9, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX9-DL-NEXT: v_or_b32_e32 v6, v5, v6		; GFX9-DL-NEXT: v_or_b32_e32 v6, v5, v6
; GFX9-DL-NEXT: v_lshrrev_b32_e32 v7, 8, v6		; GFX9-DL-NEXT: v_lshrrev_b32_e32 v7, 8, v6
; GFX9-DL-NEXT: v_and_b32_e32 v3, s2, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, s2, v3
; GFX9-DL-NEXT: v_or_b32_e32 v4, v3, v4		; GFX9-DL-NEXT: v_or_b32_e32 v4, v3, v4
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_add_u32_e32 v2, v5, v2		; GFX9-DL-NEXT: v_add_u32_e32 v2, v5, v2
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v7		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v7
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_2		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_2
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v6 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_lshrrev_b32_e32 v3, 8, v4		; GFX9-DL-NEXT: v_lshrrev_b32_e32 v3, 8, v4
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX9-DL-NEXT: v_add_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc8_vecMul:		; GFX10-DL-LABEL: udot8_acc8_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40004
; GFX10-DL-NEXT: s_and_b32 s5, s0, 15		; GFX10-DL-NEXT: s_and_b32 s4, s0, 15
; GFX10-DL-NEXT: s_and_b32 s7, s1, 15		; GFX10-DL-NEXT: s_and_b32 s6, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x4000c
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, s2, s4		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, s2, s3
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, s5, s7		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, s4, s6
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40008
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s6, s8		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s5, s7
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 8, v3		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 8, v3
; GFX10-DL-NEXT: s_mov_b32 s5, 0xffff		; GFX10-DL-NEXT: s_mov_b32 s4, 0xffff
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v6, s2, s4		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v6, s2, s3
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v5, 8, v5		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v5, 8, v5
; GFX10-DL-NEXT: v_or_b32_e32 v3, v4, v3		; GFX10-DL-NEXT: v_or_b32_e32 v3, v4, v3
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x40014
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40018
; GFX10-DL-NEXT: v_or_b32_sdwa v4, v6, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX10-DL-NEXT: v_or_b32_sdwa v4, v6, v5 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX10-DL-NEXT: v_and_b32_e32 v3, s5, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s4, v3
; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40010
; GFX10-DL-NEXT: s_lshr_b32 s9, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s8, s1, 28
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s4, s7		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s3, s6
; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v4		; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v4
; GFX10-DL-NEXT: s_bfe_u32 s1, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s1, s1, 0x40018
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v6, s2, s8		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v6, s2, s7
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v7, s0, s9		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v7, s0, s8
; GFX10-DL-NEXT: v_lshrrev_b32_e32 v8, 8, v4		; GFX10-DL-NEXT: v_lshrrev_b32_e32 v8, 8, v4
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 8, v7		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 8, v7
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v3, v2
; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 8, v5		; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 8, v5
; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s6, s1		; GFX10-DL-NEXT: v_mul_lo_u16_e64 v5, s5, s1
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v8		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v8
; GFX10-DL-NEXT: v_or_b32_e32 v3, v6, v3		; GFX10-DL-NEXT: v_or_b32_e32 v3, v6, v3
; GFX10-DL-NEXT: v_or_b32_sdwa v5, v5, v7 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD		; GFX10-DL-NEXT: v_or_b32_sdwa v5, v5, v7 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_2		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_2
; GFX10-DL-NEXT: v_and_b32_e32 v3, s5, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, s4, v3
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v5		; GFX10-DL-NEXT: v_or_b32_e32 v4, v3, v5
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_lshrrev_b32_e32 v3, 8, v4		; GFX10-DL-NEXT: v_lshrrev_b32_e32 v3, 8, v4
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_1
; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3		; GFX10-DL-NEXT: v_add_nc_u32_sdwa v2, v2, v4 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_3
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
Show All 30 Lines	entry:
store i8 %add8, i8 addrspace(1)* %dst, align 4		store i8 %add8, i8 addrspace(1)* %dst, align 4
ret void		ret void
}		}

; TODO: Once the adictional "and+add" are removed, the pattern will be recognized.		; TODO: Once the adictional "and+add" are removed, the pattern will be recognized.
define amdgpu_kernel void @udot8_acc4_vecMul(<8 x i4> addrspace(1)* %src1,		define amdgpu_kernel void @udot8_acc4_vecMul(<8 x i4> addrspace(1)* %src1,
; GFX7-LABEL: udot8_acc4_vecMul:		; GFX7-LABEL: udot8_acc4_vecMul:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s7, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s6, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s2, s0, 28		; GFX7-NEXT: s_lshr_b32 s6, s4, 28
; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018		; GFX7-NEXT: s_bfe_u32 s14, s5, 0x40018
; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014		; GFX7-NEXT: s_bfe_u32 s15, s5, 0x40014
; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010		; GFX7-NEXT: s_bfe_u32 s16, s5, 0x40010
; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c		; GFX7-NEXT: s_bfe_u32 s17, s5, 0x4000c
; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008		; GFX7-NEXT: s_bfe_u32 s18, s5, 0x40008
; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004		; GFX7-NEXT: s_bfe_u32 s19, s5, 0x40004
; GFX7-NEXT: s_lshr_b32 s14, s1, 28		; GFX7-NEXT: s_lshr_b32 s13, s5, 28
; GFX7-NEXT: s_and_b32 s1, s1, 15		; GFX7-NEXT: s_and_b32 s5, s5, 15
; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018		; GFX7-NEXT: s_bfe_u32 s7, s4, 0x40018
; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014		; GFX7-NEXT: s_bfe_u32 s8, s4, 0x40014
; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010		; GFX7-NEXT: s_bfe_u32 s9, s4, 0x40010
; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c		; GFX7-NEXT: s_bfe_u32 s10, s4, 0x4000c
; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008		; GFX7-NEXT: s_bfe_u32 s11, s4, 0x40008
; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004		; GFX7-NEXT: s_bfe_u32 s12, s4, 0x40004
; GFX7-NEXT: s_and_b32 s0, s0, 15		; GFX7-NEXT: s_and_b32 s4, s4, 15
; GFX7-NEXT: v_mov_b32_e32 v1, s1		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mov_b32_e32 v2, s20		; GFX7-NEXT: v_mov_b32_e32 v2, s19
; GFX7-NEXT: v_mov_b32_e32 v3, s19		; GFX7-NEXT: v_mov_b32_e32 v3, s18
; GFX7-NEXT: v_mov_b32_e32 v4, s18		; GFX7-NEXT: v_mov_b32_e32 v4, s17
; GFX7-NEXT: v_mov_b32_e32 v5, s17		; GFX7-NEXT: v_mov_b32_e32 v5, s16
; GFX7-NEXT: v_mov_b32_e32 v6, s16		; GFX7-NEXT: v_mov_b32_e32 v6, s15
; GFX7-NEXT: v_mov_b32_e32 v7, s15		; GFX7-NEXT: v_mov_b32_e32 v7, s14
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s0, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s13, v2, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s12, v2, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s12, v3, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s11, v4, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s10, v4, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s10, v5, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s9, v5, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s9, v6, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s8, v6, v0
; GFX7-NEXT: v_mad_u32_u24 v0, s8, v7, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s7, v7, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s14		; GFX7-NEXT: v_mov_b32_e32 v1, s13
; GFX7-NEXT: v_mad_u32_u24 v0, s2, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_and_b32_e32 v0, 15, v0		; GFX7-NEXT: v_and_b32_e32 v0, 15, v0
; GFX7-NEXT: buffer_store_byte v0, off, s[4:7], 0		; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot8_acc4_vecMul:		; GFX8-LABEL: udot8_acc4_vecMul:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_load_ubyte v2, v[0:1]		; GFX8-NEXT: flat_load_ubyte v2, v[0:1]
; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX8-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX8-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_and_b32 s9, s0, 15		; GFX8-NEXT: s_and_b32 s8, s0, 15
; GFX8-NEXT: s_and_b32 s16, s1, 15		; GFX8-NEXT: s_and_b32 s15, s1, 15
; GFX8-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v4, s16		; GFX8-NEXT: v_mov_b32_e32 v4, s15
; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX8-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX8-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX8-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX8-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX8-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX8-NEXT: s_lshr_b32 s10, s1, 28		; GFX8-NEXT: s_lshr_b32 s9, s1, 28
; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX8-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX8-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX8-NEXT: v_mov_b32_e32 v5, s15		; GFX8-NEXT: v_mov_b32_e32 v5, s14
; GFX8-NEXT: s_lshr_b32 s2, s0, 28		; GFX8-NEXT: s_lshr_b32 s2, s0, 28
; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX8-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX8-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX8-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX8-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX8-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX8-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX8-NEXT: v_mov_b32_e32 v3, s1		; GFX8-NEXT: v_mov_b32_e32 v3, s1
; GFX8-NEXT: v_mov_b32_e32 v6, s14		; GFX8-NEXT: v_mov_b32_e32 v6, s13
; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX8-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX8-NEXT: v_and_b32_e32 v3, 15, v3		; GFX8-NEXT: v_and_b32_e32 v3, 15, v3
; GFX8-NEXT: v_mov_b32_e32 v7, s13		; GFX8-NEXT: v_mov_b32_e32 v7, s12
; GFX8-NEXT: v_mov_b32_e32 v8, s12		; GFX8-NEXT: v_mov_b32_e32 v8, s11
; GFX8-NEXT: v_mov_b32_e32 v9, s11		; GFX8-NEXT: v_mov_b32_e32 v9, s10
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v2		; GFX8-NEXT: v_add_u32_e32 v2, vcc, v3, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX8-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX8-NEXT: v_mov_b32_e32 v3, s10		; GFX8-NEXT: v_mov_b32_e32 v3, s9
; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX8-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX8-NEXT: v_and_b32_e32 v2, 15, v2		; GFX8-NEXT: v_and_b32_e32 v2, 15, v2
; GFX8-NEXT: flat_store_byte v[0:1], v2		; GFX8-NEXT: flat_store_byte v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-LABEL: udot8_acc4_vecMul:		; GFX9-LABEL: udot8_acc4_vecMul:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_and_b32 s9, s0, 15		; GFX9-NEXT: s_and_b32 s8, s0, 15
; GFX9-NEXT: s_and_b32 s16, s1, 15		; GFX9-NEXT: s_and_b32 s15, s1, 15
; GFX9-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v4, s16		; GFX9-NEXT: v_mov_b32_e32 v4, s15
; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-NEXT: v_mov_b32_e32 v5, s15		; GFX9-NEXT: v_mov_b32_e32 v5, s14
; GFX9-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_mov_b32_e32 v6, s14		; GFX9-NEXT: v_mov_b32_e32 v6, s13
; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-NEXT: v_mov_b32_e32 v7, s13		; GFX9-NEXT: v_mov_b32_e32 v7, s12
; GFX9-NEXT: v_mov_b32_e32 v8, s12		; GFX9-NEXT: v_mov_b32_e32 v8, s11
; GFX9-NEXT: v_mov_b32_e32 v9, s11		; GFX9-NEXT: v_mov_b32_e32 v9, s10
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-NEXT: v_mov_b32_e32 v3, s10		; GFX9-NEXT: v_mov_b32_e32 v3, s9
; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-NEXT: global_store_byte v[0:1], v2, off		; GFX9-NEXT: global_store_byte v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: udot8_acc4_vecMul:		; GFX9-DL-LABEL: udot8_acc4_vecMul:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX9-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX9-DL-NEXT: s_load_dword s0, s[4:5], 0x0
; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX9-DL-NEXT: s_load_dword s1, s[6:7], 0x0
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_and_b32 s9, s0, 15		; GFX9-DL-NEXT: s_and_b32 s8, s0, 15
; GFX9-DL-NEXT: s_and_b32 s16, s1, 15		; GFX9-DL-NEXT: s_and_b32 s15, s1, 15
; GFX9-DL-NEXT: s_bfe_u32 s15, s1, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v4, s16		; GFX9-DL-NEXT: v_mov_b32_e32 v4, s15
; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s10, s1, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s11, s1, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s12, s1, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s14, s1, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s13, s1, 0x40008
; GFX9-DL-NEXT: s_lshr_b32 s10, s1, 28		; GFX9-DL-NEXT: s_lshr_b32 s9, s1, 28
; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s1, s1, 0x4000c
; GFX9-DL-NEXT: s_bfe_u32 s8, s0, 0x40004		; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40004
; GFX9-DL-NEXT: v_mov_b32_e32 v5, s15		; GFX9-DL-NEXT: v_mov_b32_e32 v5, s14
; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28		; GFX9-DL-NEXT: s_lshr_b32 s2, s0, 28
; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40018		; GFX9-DL-NEXT: s_bfe_u32 s3, s0, 0x40018
; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX9-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40010		; GFX9-DL-NEXT: s_bfe_u32 s5, s0, 0x40010
; GFX9-DL-NEXT: s_bfe_u32 s7, s0, 0x40008		; GFX9-DL-NEXT: s_bfe_u32 s6, s0, 0x40008
; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c		; GFX9-DL-NEXT: s_bfe_u32 s0, s0, 0x4000c
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s1
; GFX9-DL-NEXT: v_mov_b32_e32 v6, s14		; GFX9-DL-NEXT: v_mov_b32_e32 v6, s13
; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3		; GFX9-DL-NEXT: v_mul_u32_u24_e32 v3, s0, v3
; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX9-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX9-DL-NEXT: v_mov_b32_e32 v7, s13		; GFX9-DL-NEXT: v_mov_b32_e32 v7, s12
; GFX9-DL-NEXT: v_mov_b32_e32 v8, s12		; GFX9-DL-NEXT: v_mov_b32_e32 v8, s11
; GFX9-DL-NEXT: v_mov_b32_e32 v9, s11		; GFX9-DL-NEXT: v_mov_b32_e32 v9, s10
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s9, v4, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v4, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s8, v5, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v5, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v6, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v6, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3		; GFX9-DL-NEXT: v_add_u32_e32 v2, v2, v3
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s6, v7, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v7, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v8, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v8, v2
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v9, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v9, v2
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s10		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s9
; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2		; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v3, v2
; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX9-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX9-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: udot8_acc4_vecMul:		; GFX10-DL-LABEL: udot8_acc4_vecMul:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s2
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s3
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off		; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0		; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0		; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: s_and_b32 s2, s0, 15		; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
; GFX10-DL-NEXT: s_and_b32 s4, s1, 15		; GFX10-DL-NEXT: s_and_b32 s3, s1, 15
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40008
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x4000c
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x4000c		; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x4000c
; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40014
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s7, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s6, v2
; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s4, s5		; GFX10-DL-NEXT: v_mul_u32_u24_e64 v3, s3, s4
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40010		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40010
; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40014		; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40014
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3		; GFX10-DL-NEXT: v_and_b32_e32 v3, 15, v3
; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3		; GFX10-DL-NEXT: v_add_nc_u32_e32 v2, v2, v3
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x40018		; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x40018
; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28		; GFX10-DL-NEXT: s_lshr_b32 s0, s0, 28
; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28		; GFX10-DL-NEXT: s_lshr_b32 s1, s1, 28
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s5, s6, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s5, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2		; GFX10-DL-NEXT: v_mad_u32_u24 v2, s0, s1, v2
; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2		; GFX10-DL-NEXT: v_and_b32_e32 v2, 15, v2
; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off		; GFX10-DL-NEXT: global_store_byte v[0:1], v2, off
; GFX10-DL-NEXT: s_endpgm		; GFX10-DL-NEXT: s_endpgm
<8 x i4> addrspace(1)* %src2,		<8 x i4> addrspace(1)* %src2,
i4 addrspace(1)* nocapture %dst) {		i4 addrspace(1)* nocapture %dst) {
entry:		entry:
%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1		%vec1 = load <8 x i4>, <8 x i4> addrspace(1)* %src1
▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll

Show All 10 Lines	define amdgpu_kernel void @extract_w_offset_vgpr(i32 addrspace(1)* %out) {
; GCN-LABEL: name: extract_w_offset_vgpr		; GCN-LABEL: name: extract_w_offset_vgpr
; GCN: bb.0.entry:		; GCN: bb.0.entry:
; GCN: successors: %bb.1(0x80000000)		; GCN: successors: %bb.1(0x80000000)
; GCN: liveins: $vgpr0, $sgpr0_sgpr1		; GCN: liveins: $vgpr0, $sgpr0_sgpr1
; GCN: renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr0_sgpr1, 36, 0, 0 :: (dereferenceable invariant load 8 from %ir.out.kernarg.offset.cast, align 4, addrspace 4)		; GCN: renamable $sgpr0_sgpr1 = S_LOAD_DWORDX2_IMM killed renamable $sgpr0_sgpr1, 36, 0, 0 :: (dereferenceable invariant load 8 from %ir.out.kernarg.offset.cast, align 4, addrspace 4)
; GCN: renamable $sgpr2 = COPY renamable $sgpr1		; GCN: renamable $sgpr2 = COPY renamable $sgpr1
; GCN: renamable $sgpr0 = COPY renamable $sgpr0, implicit killed $sgpr0_sgpr1		; GCN: renamable $sgpr0 = COPY renamable $sgpr0, implicit killed $sgpr0_sgpr1
; GCN: renamable $sgpr1 = S_MOV_B32 61440		; GCN: renamable $sgpr1 = S_MOV_B32 61440
; GCN: renamable $sgpr4 = S_MOV_B32 -1		; GCN: renamable $sgpr3 = S_MOV_B32 -1
; GCN: undef renamable $sgpr8 = COPY killed renamable $sgpr0, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11		; GCN: undef renamable $sgpr4 = COPY killed renamable $sgpr0, implicit-def $sgpr4_sgpr5_sgpr6_sgpr7
; GCN: renamable $sgpr9 = COPY killed renamable $sgpr2		; GCN: renamable $sgpr5 = COPY killed renamable $sgpr2
; GCN: renamable $sgpr10 = COPY killed renamable $sgpr4		; GCN: renamable $sgpr6 = COPY killed renamable $sgpr3
; GCN: renamable $sgpr11 = COPY killed renamable $sgpr1		; GCN: renamable $sgpr7 = COPY killed renamable $sgpr1
; GCN: renamable $sgpr0 = S_MOV_B32 16		; GCN: renamable $sgpr0 = S_MOV_B32 16
; GCN: renamable $sgpr1 = S_MOV_B32 15		; GCN: renamable $sgpr1 = S_MOV_B32 15
; GCN: renamable $sgpr2 = S_MOV_B32 14		; GCN: renamable $sgpr2 = S_MOV_B32 14
; GCN: renamable $sgpr4 = S_MOV_B32 13		; GCN: renamable $sgpr3 = S_MOV_B32 13
; GCN: renamable $sgpr5 = S_MOV_B32 12		; GCN: renamable $sgpr8 = S_MOV_B32 12
; GCN: renamable $sgpr6 = S_MOV_B32 11		; GCN: renamable $sgpr9 = S_MOV_B32 11
; GCN: renamable $sgpr7 = S_MOV_B32 10		; GCN: renamable $sgpr10 = S_MOV_B32 10
; GCN: renamable $sgpr12 = S_MOV_B32 9		; GCN: renamable $sgpr11 = S_MOV_B32 9
; GCN: renamable $sgpr13 = S_MOV_B32 8		; GCN: renamable $sgpr12 = S_MOV_B32 8
; GCN: renamable $sgpr14 = S_MOV_B32 7		; GCN: renamable $sgpr13 = S_MOV_B32 7
; GCN: renamable $sgpr15 = S_MOV_B32 6		; GCN: renamable $sgpr14 = S_MOV_B32 6
; GCN: renamable $sgpr16 = S_MOV_B32 5		; GCN: renamable $sgpr15 = S_MOV_B32 5
; GCN: renamable $sgpr17 = S_MOV_B32 3		; GCN: renamable $sgpr16 = S_MOV_B32 3
; GCN: renamable $sgpr18 = S_MOV_B32 2		; GCN: renamable $sgpr17 = S_MOV_B32 2
; GCN: renamable $sgpr19 = S_MOV_B32 1		; GCN: renamable $sgpr18 = S_MOV_B32 1
; GCN: renamable $sgpr20 = S_MOV_B32 0		; GCN: renamable $sgpr19 = S_MOV_B32 0
; GCN: renamable $vgpr1 = COPY killed renamable $sgpr20		; GCN: renamable $vgpr1 = COPY killed renamable $sgpr19
; GCN: renamable $vgpr2 = COPY killed renamable $sgpr19		; GCN: renamable $vgpr2 = COPY killed renamable $sgpr18
; GCN: renamable $vgpr3 = COPY killed renamable $sgpr18		; GCN: renamable $vgpr3 = COPY killed renamable $sgpr17
; GCN: renamable $vgpr4 = COPY killed renamable $sgpr17		; GCN: renamable $vgpr4 = COPY killed renamable $sgpr16
; GCN: renamable $vgpr5 = COPY killed renamable $sgpr16		; GCN: renamable $vgpr5 = COPY killed renamable $sgpr15
; GCN: renamable $vgpr6 = COPY killed renamable $sgpr15		; GCN: renamable $vgpr6 = COPY killed renamable $sgpr14
; GCN: renamable $vgpr7 = COPY killed renamable $sgpr14		; GCN: renamable $vgpr7 = COPY killed renamable $sgpr13
; GCN: renamable $vgpr8 = COPY killed renamable $sgpr13		; GCN: renamable $vgpr8 = COPY killed renamable $sgpr12
; GCN: renamable $vgpr9 = COPY killed renamable $sgpr12		; GCN: renamable $vgpr9 = COPY killed renamable $sgpr11
; GCN: renamable $vgpr10 = COPY killed renamable $sgpr7		; GCN: renamable $vgpr10 = COPY killed renamable $sgpr10
; GCN: renamable $vgpr11 = COPY killed renamable $sgpr6		; GCN: renamable $vgpr11 = COPY killed renamable $sgpr9
; GCN: renamable $vgpr12 = COPY killed renamable $sgpr5		; GCN: renamable $vgpr12 = COPY killed renamable $sgpr8
; GCN: renamable $vgpr13 = COPY killed renamable $sgpr4		; GCN: renamable $vgpr13 = COPY killed renamable $sgpr3
; GCN: renamable $vgpr14 = COPY killed renamable $sgpr2		; GCN: renamable $vgpr14 = COPY killed renamable $sgpr2
; GCN: renamable $vgpr15 = COPY killed renamable $sgpr1		; GCN: renamable $vgpr15 = COPY killed renamable $sgpr1
; GCN: renamable $vgpr16 = COPY killed renamable $sgpr0		; GCN: renamable $vgpr16 = COPY killed renamable $sgpr0
; GCN: undef renamable $vgpr17 = COPY killed renamable $vgpr1, implicit-def $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32		; GCN: undef renamable $vgpr17 = COPY killed renamable $vgpr1, implicit-def $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32
; GCN: renamable $vgpr18 = COPY killed renamable $vgpr2		; GCN: renamable $vgpr18 = COPY killed renamable $vgpr2
; GCN: renamable $vgpr19 = COPY killed renamable $vgpr3		; GCN: renamable $vgpr19 = COPY killed renamable $vgpr3
; GCN: renamable $vgpr20 = COPY killed renamable $vgpr4		; GCN: renamable $vgpr20 = COPY killed renamable $vgpr4
; GCN: renamable $vgpr21 = COPY killed renamable $vgpr5		; GCN: renamable $vgpr21 = COPY killed renamable $vgpr5
; GCN: renamable $vgpr22 = COPY killed renamable $vgpr6		; GCN: renamable $vgpr22 = COPY killed renamable $vgpr6
; GCN: renamable $vgpr23 = COPY killed renamable $vgpr7		; GCN: renamable $vgpr23 = COPY killed renamable $vgpr7
; GCN: renamable $vgpr24 = COPY killed renamable $vgpr8		; GCN: renamable $vgpr24 = COPY killed renamable $vgpr8
; GCN: renamable $vgpr25 = COPY killed renamable $vgpr9		; GCN: renamable $vgpr25 = COPY killed renamable $vgpr9
; GCN: renamable $vgpr26 = COPY killed renamable $vgpr10		; GCN: renamable $vgpr26 = COPY killed renamable $vgpr10
; GCN: renamable $vgpr27 = COPY killed renamable $vgpr11		; GCN: renamable $vgpr27 = COPY killed renamable $vgpr11
; GCN: renamable $vgpr28 = COPY killed renamable $vgpr12		; GCN: renamable $vgpr28 = COPY killed renamable $vgpr12
; GCN: renamable $vgpr29 = COPY killed renamable $vgpr13		; GCN: renamable $vgpr29 = COPY killed renamable $vgpr13
; GCN: renamable $vgpr30 = COPY killed renamable $vgpr14		; GCN: renamable $vgpr30 = COPY killed renamable $vgpr14
; GCN: renamable $vgpr31 = COPY killed renamable $vgpr15		; GCN: renamable $vgpr31 = COPY killed renamable $vgpr15
; GCN: renamable $vgpr32 = COPY killed renamable $vgpr16		; GCN: renamable $vgpr32 = COPY killed renamable $vgpr16
; GCN: renamable $sgpr22_sgpr23 = S_MOV_B64 $exec		; GCN: renamable $sgpr20_sgpr21 = S_MOV_B64 $exec
; GCN: renamable $vgpr1 = IMPLICIT_DEF		; GCN: renamable $vgpr1 = IMPLICIT_DEF
; GCN: renamable $sgpr24_sgpr25 = IMPLICIT_DEF		; GCN: renamable $sgpr22_sgpr23 = IMPLICIT_DEF
; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
; GCN: SI_SPILL_S128_SAVE killed $sgpr8_sgpr9_sgpr10_sgpr11, %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 16 into %stack.1, align 4, addrspace 5)		; GCN: SI_SPILL_S128_SAVE killed $sgpr4_sgpr5_sgpr6_sgpr7, %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 16 into %stack.1, align 4, addrspace 5)
; GCN: SI_SPILL_V512_SAVE killed $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32, %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 64 into %stack.2, align 4, addrspace 5)		; GCN: SI_SPILL_V512_SAVE killed $vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31_vgpr32, %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 64 into %stack.2, align 4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr22_sgpr23, %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 8 into %stack.3, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr20_sgpr21, %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.3, align 4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr24_sgpr25, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 8 into %stack.5, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr22_sgpr23, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)
; GCN: bb.1:		; GCN: bb.1:
; GCN: successors: %bb.1(0x40000000), %bb.3(0x40000000)		; GCN: successors: %bb.1(0x40000000), %bb.3(0x40000000)
; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (load 8 from %stack.5, align 4, addrspace 5)		; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (load 8 from %stack.5, align 4, addrspace 5)
; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)		; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)
; GCN: $vgpr1 = SI_SPILL_V32_RESTORE %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)		; GCN: $vgpr1 = SI_SPILL_V32_RESTORE %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
; GCN: renamable $sgpr2 = V_READFIRSTLANE_B32 $vgpr1, implicit $exec		; GCN: renamable $sgpr2 = V_READFIRSTLANE_B32 $vgpr1, implicit $exec
; GCN: renamable $sgpr4_sgpr5 = V_CMP_EQ_U32_e64 $sgpr2, killed $vgpr1, implicit $exec		; GCN: renamable $sgpr4_sgpr5 = V_CMP_EQ_U32_e64 $sgpr2, killed $vgpr1, implicit $exec
; GCN: renamable $sgpr4_sgpr5 = S_AND_SAVEEXEC_B64 killed renamable $sgpr4_sgpr5, implicit-def $exec, implicit-def $scc, implicit $exec		; GCN: renamable $sgpr4_sgpr5 = S_AND_SAVEEXEC_B64 killed renamable $sgpr4_sgpr5, implicit-def $exec, implicit-def $scc, implicit $exec
; GCN: S_SET_GPR_IDX_ON killed renamable $sgpr2, 1, implicit-def $m0, implicit undef $m0		; GCN: S_SET_GPR_IDX_ON killed renamable $sgpr2, 1, implicit-def $m0, implicit undef $m0
; GCN: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = SI_SPILL_V512_RESTORE %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (load 64 from %stack.2, align 4, addrspace 5)		; GCN: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = SI_SPILL_V512_RESTORE %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 64 from %stack.2, align 4, addrspace 5)
; GCN: renamable $vgpr18 = V_MOV_B32_e32 undef $vgpr3, implicit $exec, implicit killed $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17, implicit $m0		; GCN: renamable $vgpr18 = V_MOV_B32_e32 undef $vgpr3, implicit $exec, implicit killed $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17, implicit $m0
; GCN: S_SET_GPR_IDX_OFF		; GCN: S_SET_GPR_IDX_OFF
; GCN: renamable $vgpr19 = COPY renamable $vgpr18		; GCN: renamable $vgpr19 = COPY renamable $vgpr18
; GCN: renamable $sgpr6_sgpr7 = COPY renamable $sgpr4_sgpr5		; GCN: renamable $sgpr6_sgpr7 = COPY renamable $sgpr4_sgpr5
; GCN: SI_SPILL_S64_SAVE killed $sgpr6_sgpr7, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 8 into %stack.5, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr6_sgpr7, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (store 8 into %stack.6, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.6, align 4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr19, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr19, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.7, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.7, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr18, %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (store 4 into %stack.8, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr18, %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.8, addrspace 5)
; GCN: $exec = S_XOR_B64_term $exec, killed renamable $sgpr4_sgpr5, implicit-def $scc		; GCN: $exec = S_XOR_B64_term $exec, killed renamable $sgpr4_sgpr5, implicit-def $scc
; GCN: S_CBRANCH_EXECNZ %bb.1, implicit $exec		; GCN: S_CBRANCH_EXECNZ %bb.1, implicit $exec
; GCN: bb.3:		; GCN: bb.3:
; GCN: successors: %bb.2(0x80000000)		; GCN: successors: %bb.2(0x80000000)
; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (load 8 from %stack.3, align 4, addrspace 5)		; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (load 8 from %stack.3, align 4, addrspace 5)
; GCN: $exec = S_MOV_B64 killed renamable $sgpr0_sgpr1		; GCN: $exec = S_MOV_B64 killed renamable $sgpr0_sgpr1
; GCN: bb.2:		; GCN: bb.2:
; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr3, 0, implicit $exec :: (load 4 from %stack.8, addrspace 5)		; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 4 from %stack.8, addrspace 5)
; GCN: $sgpr4_sgpr5_sgpr6_sgpr7 = SI_SPILL_S128_RESTORE %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr3 :: (load 16 from %stack.1, align 4, addrspace 5)		; GCN: $sgpr0_sgpr1_sgpr2_sgpr3 = SI_SPILL_S128_RESTORE %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (load 16 from %stack.1, align 4, addrspace 5)
; GCN: BUFFER_STORE_DWORD_OFFSET renamable $vgpr0, renamable $sgpr4_sgpr5_sgpr6_sgpr7, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.out.load, addrspace 1)		; GCN: BUFFER_STORE_DWORD_OFFSET renamable $vgpr0, renamable $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.out.load, addrspace 1)
; GCN: S_ENDPGM 0		; GCN: S_ENDPGM 0
entry:		entry:
%id = call i32 @llvm.amdgcn.workitem.id.x() #1		%id = call i32 @llvm.amdgcn.workitem.id.x() #1
%index = add i32 %id, 1		%index = add i32 %id, 1
%value = extractelement <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, i32 %index		%value = extractelement <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16>, i32 %index
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/indirect-call.ll

	Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; GCN-NEXT: kernarg_segment_alignment = 4			; GCN-NEXT: kernarg_segment_alignment = 4
	; GCN-NEXT: group_segment_alignment = 4			; GCN-NEXT: group_segment_alignment = 4
	; GCN-NEXT: private_segment_alignment = 4			; GCN-NEXT: private_segment_alignment = 4
	; GCN-NEXT: wavefront_size = 6			; GCN-NEXT: wavefront_size = 6
	; GCN-NEXT: call_convention = -1			; GCN-NEXT: call_convention = -1
	; GCN-NEXT: runtime_loader_kernel_symbol = 0			; GCN-NEXT: runtime_loader_kernel_symbol = 0
	; GCN-NEXT: .end_amd_kernel_code_t			; GCN-NEXT: .end_amd_kernel_code_t
	; GCN-NEXT: ; %bb.0:			; GCN-NEXT: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s33, s17			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 flat_scratch_lo, s13			; GCN-NEXT: s_mov_b32 flat_scratch_lo, s13
	; GCN-NEXT: s_add_u32 s12, s12, s33			; GCN-NEXT: s_add_u32 s12, s12, s17
	; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8			; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
				; GCN-NEXT: s_add_u32 s0, s0, s17
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, gv.fptr0@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, gv.fptr0@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, gv.fptr0@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, gv.fptr0@rel32@hi+4
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: v_lshlrev_b32_e32 v2, 20, v2			; GCN-NEXT: v_lshlrev_b32_e32 v2, 20, v2
	; GCN-NEXT: v_lshlrev_b32_e32 v1, 10, v1			; GCN-NEXT: v_lshlrev_b32_e32 v1, 10, v1
	; GCN-NEXT: v_or_b32_e32 v0, v0, v1			; GCN-NEXT: v_or_b32_e32 v0, v0, v1
	; GCN-NEXT: v_or_b32_e32 v31, v0, v2			; GCN-NEXT: v_or_b32_e32 v31, v0, v2
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GCN-NEXT: kernarg_segment_alignment = 4			; GCN-NEXT: kernarg_segment_alignment = 4
	; GCN-NEXT: group_segment_alignment = 4			; GCN-NEXT: group_segment_alignment = 4
	; GCN-NEXT: private_segment_alignment = 4			; GCN-NEXT: private_segment_alignment = 4
	; GCN-NEXT: wavefront_size = 6			; GCN-NEXT: wavefront_size = 6
	; GCN-NEXT: call_convention = -1			; GCN-NEXT: call_convention = -1
	; GCN-NEXT: runtime_loader_kernel_symbol = 0			; GCN-NEXT: runtime_loader_kernel_symbol = 0
	; GCN-NEXT: .end_amd_kernel_code_t			; GCN-NEXT: .end_amd_kernel_code_t
	; GCN-NEXT: ; %bb.0:			; GCN-NEXT: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s33, s17			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 flat_scratch_lo, s13			; GCN-NEXT: s_mov_b32 flat_scratch_lo, s13
	; GCN-NEXT: s_add_u32 s12, s12, s33			; GCN-NEXT: s_add_u32 s12, s12, s17
	; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8			; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
				; GCN-NEXT: s_add_u32 s0, s0, s17
				; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, gv.fptr1@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, gv.fptr1@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, gv.fptr1@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, gv.fptr1@rel32@hi+4
	; GCN-NEXT: v_lshlrev_b32_e32 v2, 20, v2			; GCN-NEXT: v_lshlrev_b32_e32 v2, 20, v2
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: v_lshlrev_b32_e32 v1, 10, v1			; GCN-NEXT: v_lshlrev_b32_e32 v1, 10, v1
	; GCN-NEXT: v_or_b32_e32 v0, v0, v1			; GCN-NEXT: v_or_b32_e32 v0, v0, v1
	; GCN-NEXT: v_or_b32_e32 v31, v0, v2			; GCN-NEXT: v_or_b32_e32 v31, v0, v2
	Show All 19 Lines

llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll

	Show First 20 Lines • Show All 1,614 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_kernel void @dynamic_insertelement_v8f64(<8 x double> addrspace(1)* %out, <8 x double> %a, i32 %b) #0 {			define amdgpu_kernel void @dynamic_insertelement_v8f64(<8 x double> addrspace(1)* %out, <8 x double> %a, i32 %b) #0 {
	; SI-LABEL: dynamic_insertelement_v8f64:			; SI-LABEL: dynamic_insertelement_v8f64:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_load_dwordx2 s[8:9], s[4:5], 0x0			; SI-NEXT: s_load_dwordx2 s[8:9], s[4:5], 0x0
	; SI-NEXT: s_load_dwordx16 s[12:27], s[4:5], 0x10			; SI-NEXT: s_load_dwordx16 s[12:27], s[4:5], 0x10
	; SI-NEXT: s_load_dword s4, s[4:5], 0x20			; SI-NEXT: s_load_dword s4, s[4:5], 0x20
				; SI-NEXT: s_add_u32 s0, s0, s7
				; SI-NEXT: s_addc_u32 s1, s1, 0
	; SI-NEXT: v_mov_b32_e32 v16, 64			; SI-NEXT: v_mov_b32_e32 v16, 64
	; SI-NEXT: s_mov_b32 s11, 0x100f000
	; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: v_mov_b32_e32 v0, s12			; SI-NEXT: v_mov_b32_e32 v0, s12
	; SI-NEXT: s_and_b32 s4, s4, 7			; SI-NEXT: s_and_b32 s4, s4, 7
	; SI-NEXT: s_lshl_b32 s4, s4, 3			; SI-NEXT: s_lshl_b32 s4, s4, 3
	; SI-NEXT: v_mov_b32_e32 v1, s13			; SI-NEXT: v_mov_b32_e32 v1, s13
	; SI-NEXT: v_mov_b32_e32 v12, s24			; SI-NEXT: v_mov_b32_e32 v12, s24
	; SI-NEXT: v_mov_b32_e32 v13, s25			; SI-NEXT: v_mov_b32_e32 v13, s25
	; SI-NEXT: v_mov_b32_e32 v14, s26			; SI-NEXT: v_mov_b32_e32 v14, s26
	; SI-NEXT: v_mov_b32_e32 v15, s27			; SI-NEXT: v_mov_b32_e32 v15, s27
	; SI-NEXT: v_mov_b32_e32 v2, s14			; SI-NEXT: v_mov_b32_e32 v2, s14
	; SI-NEXT: v_mov_b32_e32 v3, s15			; SI-NEXT: v_mov_b32_e32 v3, s15
	; SI-NEXT: v_mov_b32_e32 v4, s16			; SI-NEXT: v_mov_b32_e32 v4, s16
	; SI-NEXT: v_mov_b32_e32 v5, s17			; SI-NEXT: v_mov_b32_e32 v5, s17
	; SI-NEXT: v_mov_b32_e32 v6, s18			; SI-NEXT: v_mov_b32_e32 v6, s18
	; SI-NEXT: v_mov_b32_e32 v7, s19			; SI-NEXT: v_mov_b32_e32 v7, s19
	; SI-NEXT: v_mov_b32_e32 v8, s20			; SI-NEXT: v_mov_b32_e32 v8, s20
	; SI-NEXT: v_mov_b32_e32 v9, s21			; SI-NEXT: v_mov_b32_e32 v9, s21
	; SI-NEXT: v_mov_b32_e32 v10, s22			; SI-NEXT: v_mov_b32_e32 v10, s22
	; SI-NEXT: v_mov_b32_e32 v11, s23			; SI-NEXT: v_mov_b32_e32 v11, s23
	; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], 0 offset:112
	; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:96
	; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 offset:80
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; SI-NEXT: v_or_b32_e32 v16, s4, v16			; SI-NEXT: v_or_b32_e32 v16, s4, v16
	; SI-NEXT: v_mov_b32_e32 v0, 0			; SI-NEXT: v_mov_b32_e32 v0, 0
	; SI-NEXT: v_mov_b32_e32 v1, 0x40200000			; SI-NEXT: v_mov_b32_e32 v1, 0x40200000
	; SI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], s7 offen			; SI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], 0 offen
	; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; SI-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; SI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:80
	; SI-NEXT: buffer_load_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; SI-NEXT: buffer_load_dwordx4 v[8:11], off, s[0:3], 0 offset:96
	; SI-NEXT: buffer_load_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; SI-NEXT: buffer_load_dwordx4 v[12:15], off, s[0:3], 0 offset:112
				; SI-NEXT: s_mov_b32 s11, 0x100f000
				; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[8:11], 0 offset:48			; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[8:11], 0 offset:48
	; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[8:11], 0 offset:32			; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[8:11], 0 offset:32
	; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[8:11], 0 offset:16			; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[8:11], 0 offset:16
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: dynamic_insertelement_v8f64:			; VI-LABEL: dynamic_insertelement_v8f64:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[8:9], s[4:5], 0x0			; VI-NEXT: s_load_dwordx2 s[8:9], s[4:5], 0x0
	; VI-NEXT: s_load_dwordx16 s[12:27], s[4:5], 0x40			; VI-NEXT: s_load_dwordx16 s[12:27], s[4:5], 0x40
	; VI-NEXT: s_load_dword s4, s[4:5], 0x80			; VI-NEXT: s_load_dword s4, s[4:5], 0x80
				; VI-NEXT: s_add_u32 s0, s0, s7
				; VI-NEXT: s_addc_u32 s1, s1, 0
	; VI-NEXT: v_mov_b32_e32 v16, 64			; VI-NEXT: v_mov_b32_e32 v16, 64
	; VI-NEXT: s_mov_b32 s11, 0x1100f000
	; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s12			; VI-NEXT: v_mov_b32_e32 v0, s12
	; VI-NEXT: s_and_b32 s4, s4, 7			; VI-NEXT: s_and_b32 s4, s4, 7
	; VI-NEXT: s_lshl_b32 s4, s4, 3			; VI-NEXT: s_lshl_b32 s4, s4, 3
	; VI-NEXT: v_mov_b32_e32 v1, s13			; VI-NEXT: v_mov_b32_e32 v1, s13
	; VI-NEXT: v_mov_b32_e32 v12, s24			; VI-NEXT: v_mov_b32_e32 v12, s24
	; VI-NEXT: v_mov_b32_e32 v13, s25			; VI-NEXT: v_mov_b32_e32 v13, s25
	; VI-NEXT: v_mov_b32_e32 v14, s26			; VI-NEXT: v_mov_b32_e32 v14, s26
	; VI-NEXT: v_mov_b32_e32 v15, s27			; VI-NEXT: v_mov_b32_e32 v15, s27
	; VI-NEXT: v_mov_b32_e32 v2, s14			; VI-NEXT: v_mov_b32_e32 v2, s14
	; VI-NEXT: v_mov_b32_e32 v3, s15			; VI-NEXT: v_mov_b32_e32 v3, s15
	; VI-NEXT: v_mov_b32_e32 v4, s16			; VI-NEXT: v_mov_b32_e32 v4, s16
	; VI-NEXT: v_mov_b32_e32 v5, s17			; VI-NEXT: v_mov_b32_e32 v5, s17
	; VI-NEXT: v_mov_b32_e32 v6, s18			; VI-NEXT: v_mov_b32_e32 v6, s18
	; VI-NEXT: v_mov_b32_e32 v7, s19			; VI-NEXT: v_mov_b32_e32 v7, s19
	; VI-NEXT: v_mov_b32_e32 v8, s20			; VI-NEXT: v_mov_b32_e32 v8, s20
	; VI-NEXT: v_mov_b32_e32 v9, s21			; VI-NEXT: v_mov_b32_e32 v9, s21
	; VI-NEXT: v_mov_b32_e32 v10, s22			; VI-NEXT: v_mov_b32_e32 v10, s22
	; VI-NEXT: v_mov_b32_e32 v11, s23			; VI-NEXT: v_mov_b32_e32 v11, s23
	; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], 0 offset:112
	; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:96
	; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], 0 offset:80
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; VI-NEXT: v_or_b32_e32 v16, s4, v16			; VI-NEXT: v_or_b32_e32 v16, s4, v16
	; VI-NEXT: v_mov_b32_e32 v0, 0			; VI-NEXT: v_mov_b32_e32 v0, 0
	; VI-NEXT: v_mov_b32_e32 v1, 0x40200000			; VI-NEXT: v_mov_b32_e32 v1, 0x40200000
	; VI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], s7 offen			; VI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], 0 offen
	; VI-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; VI-NEXT: buffer_load_dwordx4 v[0:3], off, s[0:3], 0 offset:64
	; VI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; VI-NEXT: buffer_load_dwordx4 v[4:7], off, s[0:3], 0 offset:80
	; VI-NEXT: buffer_load_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; VI-NEXT: buffer_load_dwordx4 v[8:11], off, s[0:3], 0 offset:96
	; VI-NEXT: buffer_load_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; VI-NEXT: buffer_load_dwordx4 v[12:15], off, s[0:3], 0 offset:112
				; VI-NEXT: s_mov_b32 s11, 0x1100f000
				; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[8:11], 0 offset:48			; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[8:11], 0 offset:48
	; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[8:11], 0 offset:32			; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[8:11], 0 offset:32
	; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[8:11], 0 offset:16			; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[8:11], 0 offset:16
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%vecins = insertelement <8 x double> %a, double 8.0, i32 %b			%vecins = insertelement <8 x double> %a, double 8.0, i32 %b
	store <8 x double> %vecins, <8 x double> addrspace(1)* %out, align 16			store <8 x double> %vecins, <8 x double> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	declare <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone }			attributes #1 = { nounwind readnone }

llvm/test/CodeGen/AMDGPU/ipra.ll

	Show All 24 Lines
	; GCN: flat_load_dword v8			; GCN: flat_load_dword v8
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-NOT: buffer_store			; GCN-NOT: buffer_store
	; GCN-NOT: buffer_load			; GCN-NOT: buffer_load
	; GCN-NOT: readlane			; GCN-NOT: readlane
	; GCN-NOT: writelane			; GCN-NOT: writelane
	; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v8			; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, v8

	; GCN: ; NumSgprs: 38			; GCN: ; NumSgprs: 37
	; GCN: ; NumVgprs: 9			; GCN: ; NumVgprs: 9
	define amdgpu_kernel void @kernel_call() #0 {			define amdgpu_kernel void @kernel_call() #0 {
	%vgpr = load volatile i32, i32 addrspace(1)* undef			%vgpr = load volatile i32, i32 addrspace(1)* undef
	tail call void @func()			tail call void @func()
	store volatile i32 %vgpr, i32 addrspace(1)* undef			store volatile i32 %vgpr, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GCNHSA: private_segment_alignment = 4			; GCNHSA: private_segment_alignment = 4
	; GCNHSA: .end_amd_kernel_code_t			; GCNHSA: .end_amd_kernel_code_t

	; GFX10HSA: s_add_u32 [[FLAT_SCR_LO:s[0-9]+]], s{{[0-9]+}}, s{{[0-9]+}}			; GFX10HSA: s_add_u32 [[FLAT_SCR_LO:s[0-9]+]], s{{[0-9]+}}, s{{[0-9]+}}
	; GFX10HSA-DAG: s_addc_u32 [[FLAT_SCR_HI:s[0-9]+]], s{{[0-9]+}}, 0			; GFX10HSA-DAG: s_addc_u32 [[FLAT_SCR_HI:s[0-9]+]], s{{[0-9]+}}, 0
	; GFX10HSA-DAG: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), [[FLAT_SCR_LO]]			; GFX10HSA-DAG: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), [[FLAT_SCR_LO]]
	; GFX10HSA-DAG: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), [[FLAT_SCR_HI]]			; GFX10HSA-DAG: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), [[FLAT_SCR_HI]]

	; GCNHSA: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[0:3], s9 offen			; GCNHSA: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[0:3], 0 offen
	; GCNHSA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[0:3], s9 offen			; GCNHSA: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[0:3], 0 offen

	; Scratch size = alloca size + emergency stack slot, align {{.*}}, addrspace(5)			; Scratch size = alloca size + emergency stack slot, align {{.*}}, addrspace(5)
	; ALL: ; ScratchSize: 32772			; ALL: ; ScratchSize: 32772
	define amdgpu_kernel void @large_alloca_compute_shader(i32 %x, i32 %y) #0 {			define amdgpu_kernel void @large_alloca_compute_shader(i32 %x, i32 %y) #0 {
	%large = alloca [8192 x i32], align 4, addrspace(5)			%large = alloca [8192 x i32], align 4, addrspace(5)
	%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191			%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191
	store volatile i32 %x, i32 addrspace(5)* %gep			store volatile i32 %x, i32 addrspace(5)* %gep
	%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y			%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y
	%val = load volatile i32, i32 addrspace(5)* %gep1			%val = load volatile i32, i32 addrspace(5)* %gep1
	store volatile i32 %val, i32 addrspace(1)* undef			store volatile i32 %val, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/large-alloca-graphics.ll

	; RUN: llc -march=amdgcn -mcpu=bonaire < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=ALL %s			; RUN: llc -march=amdgcn -mcpu=bonaire < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=ALL %s
	; RUN: llc -march=amdgcn -mcpu=carrizo -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=ALL %s			; RUN: llc -march=amdgcn -mcpu=carrizo -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=ALL %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 -check-prefix=ALL %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 -check-prefix=ALL %s

	; ALL-LABEL: {{^}}large_alloca_pixel_shader:			; ALL-LABEL: {{^}}large_alloca_pixel_shader:
	; GCN-DAG: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN-DAG: s_mov_b32 s9, SCRATCH_RSRC_DWORD1			; GCN-DAG: s_mov_b32 s5, SCRATCH_RSRC_DWORD1
	; GCN-DAG: s_mov_b32 s10, -1			; GCN-DAG: s_mov_b32 s6, -1
	; CI-DAG: s_mov_b32 s11, 0xe8f000
	; VI-DAG: s_mov_b32 s11, 0xe80000			; CI-DAG: s_mov_b32 s7, 0xe8f000
	; GFX9-DAG: s_mov_b32 s11, 0xe00000			; VI-DAG: s_mov_b32 s7, 0xe80000
				; GFX9-DAG: s_mov_b32 s7, 0xe00000

	; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[8:11], s0 offen			; GCN: s_add_u32 s4, s4, s0
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[8:11], s0 offen			; GCN: s_addc_u32 s5, s5, 0

				; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[4:7], 0 offen
				; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[4:7], 0 offen

	; ALL: ; ScratchSize: 32772			; ALL: ; ScratchSize: 32772
	define amdgpu_ps void @large_alloca_pixel_shader(i32 %x, i32 %y) #0 {			define amdgpu_ps void @large_alloca_pixel_shader(i32 %x, i32 %y) #0 {
	%large = alloca [8192 x i32], align 4, addrspace(5)			%large = alloca [8192 x i32], align 4, addrspace(5)
	%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191			%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191
	store volatile i32 %x, i32 addrspace(5)* %gep			store volatile i32 %x, i32 addrspace(5)* %gep
	%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y			%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y
	%val = load volatile i32, i32 addrspace(5)* %gep1			%val = load volatile i32, i32 addrspace(5)* %gep1
	store volatile i32 %val, i32 addrspace(1)* undef			store volatile i32 %val, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; ALL-LABEL: {{^}}large_alloca_pixel_shader_inreg:			; ALL-LABEL: {{^}}large_alloca_pixel_shader_inreg:
	; GCN-DAG: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN-DAG: s_mov_b32 s9, SCRATCH_RSRC_DWORD1			; GCN-DAG: s_mov_b32 s5, SCRATCH_RSRC_DWORD1
	; GCN-DAG: s_mov_b32 s10, -1			; GCN-DAG: s_mov_b32 s6, -1
	; CI-DAG: s_mov_b32 s11, 0xe8f000
	; VI-DAG: s_mov_b32 s11, 0xe80000			; CI-DAG: s_mov_b32 s7, 0xe8f000
	; GFX9-DAG: s_mov_b32 s11, 0xe00000			; VI-DAG: s_mov_b32 s7, 0xe80000
				; GFX9-DAG: s_mov_b32 s7, 0xe00000

				; GCN: s_add_u32 s4, s4, s2
				; GCN: s_addc_u32 s5, s5, 0

	; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[8:11], s2 offen			; GCN: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, s[4:7], 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[8:11], s2 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, s[4:7], 0 offen

	; ALL: ; ScratchSize: 32772			; ALL: ; ScratchSize: 32772
	define amdgpu_ps void @large_alloca_pixel_shader_inreg(i32 inreg %x, i32 inreg %y) #0 {			define amdgpu_ps void @large_alloca_pixel_shader_inreg(i32 inreg %x, i32 inreg %y) #0 {
	%large = alloca [8192 x i32], align 4, addrspace(5)			%large = alloca [8192 x i32], align 4, addrspace(5)
	%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191			%gep = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 8191
	store volatile i32 %x, i32 addrspace(5)* %gep			store volatile i32 %x, i32 addrspace(5)* %gep
	%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y			%gep1 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %large, i32 0, i32 %y
	%val = load volatile i32, i32 addrspace(5)* %gep1			%val = load volatile i32, i32 addrspace(5)* %gep1
	store volatile i32 %val, i32 addrspace(1)* undef			store volatile i32 %val, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.implicit.buffer.ptr.ll

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; FIXME: Requires stack object to not assert			; FIXME: Requires stack object to not assert
	; GCN-LABEL: {{^}}test_ps:			; GCN-LABEL: {{^}}test_ps:
	; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GCN: buffer_store_dword v0, off, s[4:7], s2 offset:4			; GCN: buffer_store_dword v0, off, s[4:7], 0 offset:4
	; GCN: s_load_dword s{{[0-9]+}}, s[0:1], 0x0			; GCN: s_load_dword s{{[0-9]+}}, s[0:1], 0x0
	; GCN-NEXT: s_waitcnt			; GCN-NEXT: s_waitcnt
	; GCN-NEXT: ; return			; GCN-NEXT: ; return
	define amdgpu_ps i32 @test_ps() #1 {			define amdgpu_ps i32 @test_ps() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	%implicit_buffer_ptr = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr()			%implicit_buffer_ptr = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr()
	%buffer_ptr = bitcast i8 addrspace(4)* %implicit_buffer_ptr to i32 addrspace(4)*			%buffer_ptr = bitcast i8 addrspace(4)* %implicit_buffer_ptr to i32 addrspace(4)*
	%value = load volatile i32, i32 addrspace(4)* %buffer_ptr			%value = load volatile i32, i32 addrspace(4)* %buffer_ptr
	ret i32 %value			ret i32 %value
	}			}

	; GCN-LABEL: {{^}}test_cs:			; GCN-LABEL: {{^}}test_cs:
	; GCN: s_mov_b64 s[4:5], s[0:1]			; GCN: s_mov_b64 s[4:5], s[0:1]
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], s2 offset:4			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], 0 offset:4
	; GCN: s_load_dword s0, s[0:1], 0x0			; GCN: s_load_dword s0, s[0:1], 0x0
	define amdgpu_cs i32 @test_cs() #1 {			define amdgpu_cs i32 @test_cs() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	%implicit_buffer_ptr = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr()			%implicit_buffer_ptr = call i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr()
	%buffer_ptr = bitcast i8 addrspace(4)* %implicit_buffer_ptr to i32 addrspace(4)*			%buffer_ptr = bitcast i8 addrspace(4)* %implicit_buffer_ptr to i32 addrspace(4)*
	%value = load volatile i32, i32 addrspace(4)* %buffer_ptr			%value = load volatile i32, i32 addrspace(4)* %buffer_ptr
	ret i32 %value			ret i32 %value
	}			}

	declare i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr() #0			declare i8 addrspace(4)* @llvm.amdgcn.implicit.buffer.ptr() #0

	attributes #0 = { nounwind readnone speculatable }			attributes #0 = { nounwind readnone speculatable }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }

llvm/test/CodeGen/AMDGPU/load-hi16.ll

Show First 20 Lines • Show All 525 Lines • ▼ Show 20 Lines	entry:
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %load, i32 1		%build1 = insertelement <2 x half> %build0, half %load, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_short_d16_hi v0, off, s[0:3], s33 offset:4094{{$}}		; GFX900: buffer_load_short_d16_hi v0, off, s[0:3], 0 offset:4094{{$}}
; GFX900: s_waitcnt		; GFX900: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff(i16 addrspace(5)* byval %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff(i16 addrspace(5)* byval %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_short_d16_hi v1, off, s[0:3], s33 offset:4094{{$}}		; GFX900-NEXT: buffer_load_short_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, half %reg) #0 {
entry:		entry:
%load = load volatile half, half addrspace(5)* inttoptr (i32 4094 to half addrspace(5)*)		%load = load volatile half, half addrspace(5)* inttoptr (i32 4094 to half addrspace(5)*)
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %load, i32 1		%build1 = insertelement <2 x half> %build0, half %load, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	entry:
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_zexti8:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_zexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], s33 offset:4094{{$}}		; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_sexti8:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_sexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_sbyte_d16_hi v1, off, s[0:3], s33 offset:4094{{$}}		; GFX900-NEXT: buffer_load_sbyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_sbyte v0, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff_zexti8:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff_zexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], s33 offset:4094{{$}}		; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], s33 offset:4094{{$}}		; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, half %reg) #0 {
entry:		entry:
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%bc.ext = bitcast i16 %ext to half		%bc.ext = bitcast i16 %ext to half
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %bc.ext, i32 1		%build1 = insertelement <2 x half> %build0, half %bc.ext, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/load-lo16.ll

Show First 20 Lines • Show All 1,297 Lines • ▼ Show 20 Lines	entry:
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reghi_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reghi_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX900-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0		; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0
; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0		; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	entry:
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], 0 offset:4094
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_sbyte_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_sbyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_sbyte v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_sbyte v0, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], s33 offset:4094		; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:4094		; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0		; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0
; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0		; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s33 offset:4094		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], 0 offset:4094
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
▲ Show 20 Lines • Show All 445 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-load.ll

Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @wavefront_one_as_seq_cst(
i32* %in, i32* %out) {		i32* %in, i32* %out) {
entry:		entry:
%val = load atomic i32, i32* %in syncscope("wavefront-one-as") seq_cst, align 4		%val = load atomic i32, i32* %in syncscope("wavefront-one-as") seq_cst, align 4
store i32 %val, i32* %out		store i32 %val, i32* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}nontemporal_private_0:		; GCN-LABEL: {{^}}nontemporal_private_0:
; GFX89: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen glc slc{{$}}		; GFX89: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen glc slc{{$}}
; GFX10: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen slc{{$}}		; GFX10: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen slc{{$}}
; GFX10: .amdhsa_kernel nontemporal_private_0		; GFX10: .amdhsa_kernel nontemporal_private_0
; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0		; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
; GFX10CU: .amdhsa_workgroup_processor_mode 0		; GFX10CU: .amdhsa_workgroup_processor_mode 0
; GFX10-NOT: .amdhsa_memory_ordered 0		; GFX10-NOT: .amdhsa_memory_ordered 0
define amdgpu_kernel void @nontemporal_private_0(		define amdgpu_kernel void @nontemporal_private_0(
i32 addrspace(5)* %in, i32* %out) {		i32 addrspace(5)* %in, i32* %out) {
entry:		entry:
%val = load i32, i32 addrspace(5)* %in, align 4, !nontemporal !0		%val = load i32, i32 addrspace(5)* %in, align 4, !nontemporal !0
store i32 %val, i32* %out		store i32 %val, i32* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}nontemporal_private_1:		; GCN-LABEL: {{^}}nontemporal_private_1:
; GFX89: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen glc slc{{$}}		; GFX89: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen glc slc{{$}}
; GFX10: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen slc{{$}}		; GFX10: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen slc{{$}}
; GFX10: .amdhsa_kernel nontemporal_private_1		; GFX10: .amdhsa_kernel nontemporal_private_1
; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0		; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
; GFX10CU: .amdhsa_workgroup_processor_mode 0		; GFX10CU: .amdhsa_workgroup_processor_mode 0
; GFX10-NOT: .amdhsa_memory_ordered 0		; GFX10-NOT: .amdhsa_memory_ordered 0
define amdgpu_kernel void @nontemporal_private_1(		define amdgpu_kernel void @nontemporal_private_1(
i32 addrspace(5)* %in, i32* %out) {		i32 addrspace(5)* %in, i32* %out) {
entry:		entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
▲ Show 20 Lines • Show All 545 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-store.ll

	Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @wavefront_one_as_seq_cst(			define amdgpu_kernel void @wavefront_one_as_seq_cst(
	i32 %in, i32* %out) {			i32 %in, i32* %out) {
	entry:			entry:
	store atomic i32 %in, i32* %out syncscope("wavefront-one-as") seq_cst, align 4			store atomic i32 %in, i32* %out syncscope("wavefront-one-as") seq_cst, align 4
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}nontemporal_private_0:			; GCN-LABEL: {{^}}nontemporal_private_0:
	; GFX89: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen glc slc{{$}}			; GFX89: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen glc slc{{$}}
	; GFX10: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen slc{{$}}			; GFX10: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen slc{{$}}
	; GFX10: .amdhsa_kernel nontemporal_private_0			; GFX10: .amdhsa_kernel nontemporal_private_0
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @nontemporal_private_0(			define amdgpu_kernel void @nontemporal_private_0(
	i32* %in, i32 addrspace(5)* %out) {			i32* %in, i32 addrspace(5)* %out) {
	entry:			entry:
	%val = load i32, i32* %in, align 4			%val = load i32, i32* %in, align 4
	store i32 %val, i32 addrspace(5)* %out, !nontemporal !0			store i32 %val, i32 addrspace(5)* %out, !nontemporal !0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}nontemporal_private_1:			; GCN-LABEL: {{^}}nontemporal_private_1:
	; GFX89: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen glc slc{{$}}			; GFX89: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen glc slc{{$}}
	; GFX10: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen slc{{$}}			; GFX10: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen slc{{$}}
	; GFX10: .amdhsa_kernel nontemporal_private_1			; GFX10: .amdhsa_kernel nontemporal_private_1
	; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0			; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
	; GFX10CU: .amdhsa_workgroup_processor_mode 0			; GFX10CU: .amdhsa_workgroup_processor_mode 0
	; GFX10-NOT: .amdhsa_memory_ordered 0			; GFX10-NOT: .amdhsa_memory_ordered 0
	define amdgpu_kernel void @nontemporal_private_1(			define amdgpu_kernel void @nontemporal_private_1(
	i32* %in, i32 addrspace(5)* %out) {			i32* %in, i32 addrspace(5)* %out) {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory_clause.ll

	Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines

	define void @mubuf_clause(<4 x i32> addrspace(5)* noalias nocapture readonly %arg, <4 x i32> addrspace(5)* noalias nocapture %arg1) {			define void @mubuf_clause(<4 x i32> addrspace(5)* noalias nocapture readonly %arg, <4 x i32> addrspace(5)* noalias nocapture %arg1) {
	; GCN-LABEL: mubuf_clause:			; GCN-LABEL: mubuf_clause:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v2			; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v2
	; GCN-NEXT: v_lshlrev_b32_e32 v2, 4, v2			; GCN-NEXT: v_lshlrev_b32_e32 v2, 4, v2
	; GCN-NEXT: v_add_u32_e32 v0, v0, v2			; GCN-NEXT: v_add_u32_e32 v0, v0, v2
	; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_nop 0
	; GCN-NEXT: buffer_load_dword v3, v0, s[0:3], s33 offen
	; GCN-NEXT: buffer_load_dword v4, v0, s[0:3], s33 offen offset:4
	; GCN-NEXT: buffer_load_dword v5, v0, s[0:3], s33 offen offset:8
	; GCN-NEXT: buffer_load_dword v6, v0, s[0:3], s33 offen offset:12
	; GCN-NEXT: buffer_load_dword v7, v0, s[0:3], s33 offen offset:16
	; GCN-NEXT: buffer_load_dword v8, v0, s[0:3], s33 offen offset:20
	; GCN-NEXT: buffer_load_dword v9, v0, s[0:3], s33 offen offset:24
	; GCN-NEXT: buffer_load_dword v10, v0, s[0:3], s33 offen offset:28
	; GCN-NEXT: buffer_load_dword v11, v0, s[0:3], s33 offen offset:32
	; GCN-NEXT: buffer_load_dword v12, v0, s[0:3], s33 offen offset:36
	; GCN-NEXT: buffer_load_dword v13, v0, s[0:3], s33 offen offset:40
	; GCN-NEXT: buffer_load_dword v14, v0, s[0:3], s33 offen offset:44
	; GCN-NEXT: buffer_load_dword v15, v0, s[0:3], s33 offen offset:48
	; GCN-NEXT: buffer_load_dword v16, v0, s[0:3], s33 offen offset:52
	; GCN-NEXT: buffer_load_dword v17, v0, s[0:3], s33 offen offset:56
	; GCN-NEXT: v_add_u32_e32 v1, v1, v2			; GCN-NEXT: v_add_u32_e32 v1, v1, v2
	; GCN-NEXT: s_nop 0			; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_nop 0			; GCN-NEXT: s_nop 0
	; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen offset:60			; GCN-NEXT: buffer_load_dword v6, v0, s[0:3], 0 offen offset:20
	; GCN-NEXT: s_nop 0			; GCN-NEXT: buffer_load_dword v7, v0, s[0:3], 0 offen offset:24
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v8, v0, s[0:3], 0 offen offset:28
	; GCN-NEXT: s_nop 0			; GCN-NEXT: buffer_load_dword v9, v0, s[0:3], 0 offen offset:32
	; GCN-NEXT: buffer_store_dword v3, v1, s[0:3], s33 offen			; GCN-NEXT: buffer_load_dword v10, v0, s[0:3], 0 offen offset:36
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v11, v0, s[0:3], 0 offen offset:40
	; GCN-NEXT: buffer_store_dword v4, v1, s[0:3], s33 offen offset:4			; GCN-NEXT: buffer_load_dword v12, v0, s[0:3], 0 offen offset:44
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v13, v0, s[0:3], 0 offen offset:48
	; GCN-NEXT: buffer_store_dword v5, v1, s[0:3], s33 offen offset:8			; GCN-NEXT: buffer_load_dword v14, v0, s[0:3], 0 offen offset:52
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v15, v0, s[0:3], 0 offen offset:56
	; GCN-NEXT: buffer_store_dword v6, v1, s[0:3], s33 offen offset:12			; GCN-NEXT: buffer_load_dword v16, v0, s[0:3], 0 offen offset:60
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen
	; GCN-NEXT: buffer_store_dword v7, v1, s[0:3], s33 offen offset:16			; GCN-NEXT: buffer_load_dword v3, v0, s[0:3], 0 offen offset:4
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v4, v0, s[0:3], 0 offen offset:8
	; GCN-NEXT: buffer_store_dword v8, v1, s[0:3], s33 offen offset:20			; GCN-NEXT: buffer_load_dword v5, v0, s[0:3], 0 offen offset:12
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: s_nop 0
	; GCN-NEXT: buffer_store_dword v9, v1, s[0:3], s33 offen offset:24			; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen offset:16
	; GCN-NEXT: buffer_store_dword v10, v1, s[0:3], s33 offen offset:28			; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: buffer_store_dword v11, v1, s[0:3], s33 offen offset:32			; GCN-NEXT: s_nop 0
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v2, v1, s[0:3], 0 offen
	; GCN-NEXT: buffer_store_dword v12, v1, s[0:3], s33 offen offset:36			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v3, v1, s[0:3], 0 offen offset:4
	; GCN-NEXT: buffer_store_dword v13, v1, s[0:3], s33 offen offset:40			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v4, v1, s[0:3], 0 offen offset:8
	; GCN-NEXT: buffer_store_dword v14, v1, s[0:3], s33 offen offset:44			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v5, v1, s[0:3], 0 offen offset:12
	; GCN-NEXT: buffer_store_dword v15, v1, s[0:3], s33 offen offset:48			; GCN-NEXT: s_waitcnt vmcnt(4)
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen offset:16
	; GCN-NEXT: buffer_store_dword v16, v1, s[0:3], s33 offen offset:52			; GCN-NEXT: buffer_store_dword v6, v1, s[0:3], 0 offen offset:20
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v7, v1, s[0:3], 0 offen offset:24
	; GCN-NEXT: buffer_store_dword v17, v1, s[0:3], s33 offen offset:56			; GCN-NEXT: buffer_store_dword v8, v1, s[0:3], 0 offen offset:28
	; GCN-NEXT: s_waitcnt vmcnt(15)			; GCN-NEXT: buffer_store_dword v9, v1, s[0:3], 0 offen offset:32
	; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], s33 offen offset:60			; GCN-NEXT: buffer_store_dword v10, v1, s[0:3], 0 offen offset:36
				; GCN-NEXT: buffer_store_dword v11, v1, s[0:3], 0 offen offset:40
				; GCN-NEXT: buffer_store_dword v12, v1, s[0:3], 0 offen offset:44
				; GCN-NEXT: buffer_store_dword v13, v1, s[0:3], 0 offen offset:48
				; GCN-NEXT: buffer_store_dword v14, v1, s[0:3], 0 offen offset:52
				; GCN-NEXT: buffer_store_dword v15, v1, s[0:3], 0 offen offset:56
				; GCN-NEXT: buffer_store_dword v16, v1, s[0:3], 0 offen offset:60
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp2 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(5)* %arg, i32 %tmp			%tmp2 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(5)* %arg, i32 %tmp
	%tmp3 = load <4 x i32>, <4 x i32> addrspace(5)* %tmp2, align 16			%tmp3 = load <4 x i32>, <4 x i32> addrspace(5)* %tmp2, align 16
	%tmp4 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(5)* %arg1, i32 %tmp			%tmp4 = getelementptr inbounds <4 x i32>, <4 x i32> addrspace(5)* %arg1, i32 %tmp
	%tmp5 = add nuw nsw i32 %tmp, 1			%tmp5 = add nuw nsw i32 %tmp, 1
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/mesa3d.ll

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

	; GCN-LABEL: {{^}}scratch_ps:			; GCN-LABEL: {{^}}scratch_ps:
	; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0{{$}}			; GCN: s_load_dwordx2 s[4:5], s[0:1], 0x0{{$}}
	; GCN-DAG: s_mov_b32 s6, -1{{$}}			; GCN-DAG: s_mov_b32 s6, -1{{$}}
	; GCN-DAG: s_mov_b32 s7, 0xe8f000			; GCN-DAG: s_mov_b32 s7, 0xe8f000
	; GCN-DAG: v_mov_b32_e32 [[V:v[0-9]+]], 2			; GCN-DAG: v_mov_b32_e32 [[V:v[0-9]+]], 2
	; GCN: buffer_store_dword [[V]], off, s[4:7], s2 offset:4			; GCN: buffer_store_dword [[V]], off, s[4:7], 0 offset:4
	define amdgpu_ps void @scratch_ps(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_ps void @scratch_ps(i32 addrspace(1)* %out, i32 %in) {
	entry:			entry:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 2, i32 addrspace(5)* %alloca			store volatile i32 2, i32 addrspace(5)* %alloca
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/mir-print-dead-csr-fi.mir

	Show All 9 Lines
	name: csr_sgpr			name: csr_sgpr
	tracksRegLiveness: true			tracksRegLiveness: true
	liveins:			liveins:
	- { reg: '$sgpr30_sgpr31' }			- { reg: '$sgpr30_sgpr31' }
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr4'
	frameOffsetReg: '$sgpr5'			frameOffsetReg: '$sgpr5'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr30_sgpr31			liveins: $sgpr30_sgpr31

	INLINEASM &"; clobber s42", 1, 12, implicit-def dead early-clobber $sgpr42			INLINEASM &"; clobber s42", 1, 12, implicit-def dead early-clobber $sgpr42
	S_SETPC_B64_return $sgpr30_sgpr31			S_SETPC_B64_return $sgpr30_sgpr31

	...			...

llvm/test/CodeGen/AMDGPU/misched-killflags.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs -run-pass=post-RA-sched -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs -run-pass=post-RA-sched -o - %s \| FileCheck %s
	# Make sure ScheduleDAGInstrs::fixupKills does not produce invalid kill flags.			# Make sure ScheduleDAGInstrs::fixupKills does not produce invalid kill flags.
	---			---
	name: func0			name: func0
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr7'
	frameOffsetReg: '$sgpr7'			frameOffsetReg: '$sgpr7'
	body: \|			body: \|
	bb.0:			bb.0:

	$sgpr33 = S_MOV_B32 $sgpr7			$sgpr33 = S_MOV_B32 $sgpr7
	$sgpr32 = S_MOV_B32 $sgpr33			$sgpr32 = S_MOV_B32 $sgpr33
	$sgpr10 = S_MOV_B32 5			$sgpr10 = S_MOV_B32 5
	$sgpr9 = S_MOV_B32 4			$sgpr9 = S_MOV_B32 4
	Show All 32 Lines

llvm/test/CodeGen/AMDGPU/mubuf-offset-private.ll

	; RUN: llc -march=amdgcn -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SICIVI %s			; RUN: llc -march=amdgcn -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SICIVI %s
	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SICIVI %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SICIVI %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

	; Test addressing modes when the scratch base is not a frame index.			; Test addressing modes when the scratch base is not a frame index.

	; GCN-LABEL: {{^}}store_private_offset_i8:			; GCN-LABEL: {{^}}store_private_offset_i8:
	; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_i8() #0 {			define amdgpu_kernel void @store_private_offset_i8() #0 {
	store volatile i8 5, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)			store volatile i8 5, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i16:			; GCN-LABEL: {{^}}store_private_offset_i16:
	; GCN: buffer_store_short v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_short v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_i16() #0 {			define amdgpu_kernel void @store_private_offset_i16() #0 {
	store volatile i16 5, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)			store volatile i16 5, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i32:			; GCN-LABEL: {{^}}store_private_offset_i32:
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_i32() #0 {			define amdgpu_kernel void @store_private_offset_i32() #0 {
	store volatile i32 5, i32 addrspace(5)* inttoptr (i32 8 to i32 addrspace(5)*)			store volatile i32 5, i32 addrspace(5)* inttoptr (i32 8 to i32 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_v2i32:			; GCN-LABEL: {{^}}store_private_offset_v2i32:
	; GCN: buffer_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_v2i32() #0 {			define amdgpu_kernel void @store_private_offset_v2i32() #0 {
	store volatile <2 x i32> <i32 5, i32 10>, <2 x i32> addrspace(5)* inttoptr (i32 8 to <2 x i32> addrspace(5)*)			store volatile <2 x i32> <i32 5, i32 10>, <2 x i32> addrspace(5)* inttoptr (i32 8 to <2 x i32> addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_v4i32:			; GCN-LABEL: {{^}}store_private_offset_v4i32:
	; GCN: buffer_store_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s2 offset:8			; GCN: buffer_store_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @store_private_offset_v4i32() #0 {			define amdgpu_kernel void @store_private_offset_v4i32() #0 {
	store volatile <4 x i32> <i32 5, i32 10, i32 15, i32 0>, <4 x i32> addrspace(5)* inttoptr (i32 8 to <4 x i32> addrspace(5)*)			store volatile <4 x i32> <i32 5, i32 10, i32 15, i32 0>, <4 x i32> addrspace(5)* inttoptr (i32 8 to <4 x i32> addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i8:			; GCN-LABEL: {{^}}load_private_offset_i8:
	; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_i8() #0 {			define amdgpu_kernel void @load_private_offset_i8() #0 {
	%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)			%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}sextload_private_offset_i8:			; GCN-LABEL: {{^}}sextload_private_offset_i8:
	; GCN: buffer_load_sbyte v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_sbyte v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @sextload_private_offset_i8(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @sextload_private_offset_i8(i32 addrspace(1)* %out) #0 {
	%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)			%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)
	%sextload = sext i8 %load to i32			%sextload = sext i8 %load to i32
	store i32 %sextload, i32 addrspace(1)* undef			store i32 %sextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}zextload_private_offset_i8:			; GCN-LABEL: {{^}}zextload_private_offset_i8:
	; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_ubyte v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @zextload_private_offset_i8(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @zextload_private_offset_i8(i32 addrspace(1)* %out) #0 {
	%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)			%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 8 to i8 addrspace(5)*)
	%zextload = zext i8 %load to i32			%zextload = zext i8 %load to i32
	store i32 %zextload, i32 addrspace(1)* undef			store i32 %zextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i16:			; GCN-LABEL: {{^}}load_private_offset_i16:
	; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_i16() #0 {			define amdgpu_kernel void @load_private_offset_i16() #0 {
	%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)			%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}sextload_private_offset_i16:			; GCN-LABEL: {{^}}sextload_private_offset_i16:
	; GCN: buffer_load_sshort v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_sshort v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @sextload_private_offset_i16(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @sextload_private_offset_i16(i32 addrspace(1)* %out) #0 {
	%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)			%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)
	%sextload = sext i16 %load to i32			%sextload = sext i16 %load to i32
	store i32 %sextload, i32 addrspace(1)* undef			store i32 %sextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}zextload_private_offset_i16:			; GCN-LABEL: {{^}}zextload_private_offset_i16:
	; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], s8 offset:8			; GCN: buffer_load_ushort v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @zextload_private_offset_i16(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @zextload_private_offset_i16(i32 addrspace(1)* %out) #0 {
	%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)			%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 8 to i16 addrspace(5)*)
	%zextload = zext i16 %load to i32			%zextload = zext i16 %load to i32
	store i32 %zextload, i32 addrspace(1)* undef			store i32 %zextload, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_i32:			; GCN-LABEL: {{^}}load_private_offset_i32:
	; GCN: buffer_load_dword v{{[0-9]+}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_dword v{{[0-9]+}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_i32() #0 {			define amdgpu_kernel void @load_private_offset_i32() #0 {
	%load = load volatile i32, i32 addrspace(5)* inttoptr (i32 8 to i32 addrspace(5)*)			%load = load volatile i32, i32 addrspace(5)* inttoptr (i32 8 to i32 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_v2i32:			; GCN-LABEL: {{^}}load_private_offset_v2i32:
	; GCN: buffer_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_v2i32() #0 {			define amdgpu_kernel void @load_private_offset_v2i32() #0 {
	%load = load volatile <2 x i32>, <2 x i32> addrspace(5)* inttoptr (i32 8 to <2 x i32> addrspace(5)*)			%load = load volatile <2 x i32>, <2 x i32> addrspace(5)* inttoptr (i32 8 to <2 x i32> addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_private_offset_v4i32:			; GCN-LABEL: {{^}}load_private_offset_v4i32:
	; GCN: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], s2 offset:8			; GCN: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, off, s[4:7], 0 offset:8
	define amdgpu_kernel void @load_private_offset_v4i32() #0 {			define amdgpu_kernel void @load_private_offset_v4i32() #0 {
	%load = load volatile <4 x i32>, <4 x i32> addrspace(5)* inttoptr (i32 8 to <4 x i32> addrspace(5)*)			%load = load volatile <4 x i32>, <4 x i32> addrspace(5)* inttoptr (i32 8 to <4 x i32> addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset:
	; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], s2 offset:4095			; GCN: buffer_store_byte v{{[0-9]+}}, off, s[4:7], 0 offset:4095
	define amdgpu_kernel void @store_private_offset_i8_max_offset() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset() #0 {
	store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4095 to i8 addrspace(5)*)			store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4095 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus1:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus1:
	; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000			; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000
	; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], s2 offen{{$}}			; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], 0 offen{{$}}
	define amdgpu_kernel void @store_private_offset_i8_max_offset_plus1() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset_plus1() #0 {
	store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4096 to i8 addrspace(5)*)			store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4096 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus2:			; GCN-LABEL: {{^}}store_private_offset_i8_max_offset_plus2:
	; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000			; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x1000
	; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], s2 offen offset:1{{$}}			; GCN: buffer_store_byte v{{[0-9]+}}, [[OFFSET]], s[4:7], 0 offen offset:1{{$}}
	define amdgpu_kernel void @store_private_offset_i8_max_offset_plus2() #0 {			define amdgpu_kernel void @store_private_offset_i8_max_offset_plus2() #0 {
	store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4097 to i8 addrspace(5)*)			store volatile i8 5, i8 addrspace(5)* inttoptr (i32 4097 to i8 addrspace(5)*)
	ret void			ret void
	}			}

	; MUBUF used for stack access has bounds checking enabled before gfx9,			; MUBUF used for stack access has bounds checking enabled before gfx9,
	; so a possibly negative base index can't be used for the vgpr offset.			; so a possibly negative base index can't be used for the vgpr offset.

	; GCN-LABEL: {{^}}store_private_unknown_bits_vaddr:			; GCN-LABEL: {{^}}store_private_unknown_bits_vaddr:
	; SICIVI: v_add_{{i\|u}}32_e32 [[ADDR0:v[0-9]+]], vcc, 4			; SICIVI: v_add_{{i\|u}}32_e32 [[ADDR0:v[0-9]+]], vcc, 4
	; SICIVI: v_add_{{i\|u}}32_e32 [[ADDR1:v[0-9]+]], vcc, 32, [[ADDR0]]			; SICIVI: v_add_{{i\|u}}32_e32 [[ADDR1:v[0-9]+]], vcc, 32, [[ADDR0]]
	; SICIVI: buffer_store_dword v{{[0-9]+}}, [[ADDR1]], s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}			; SICIVI: buffer_store_dword v{{[0-9]+}}, [[ADDR1]], s{{\[[0-9]+:[0-9]+\]}}, 0 offen{{$}}

	; GFX9: v_add_u32_e32 [[ADDR:v[0-9]+]], 4,			; GFX9: v_add_u32_e32 [[ADDR:v[0-9]+]], 4,
	; GFX9: buffer_store_dword v{{[0-9]+}}, [[ADDR]], s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen offset:32			; GFX9: buffer_store_dword v{{[0-9]+}}, [[ADDR]], s{{\[[0-9]+:[0-9]+\]}}, 0 offen offset:32
	define amdgpu_kernel void @store_private_unknown_bits_vaddr() #0 {			define amdgpu_kernel void @store_private_unknown_bits_vaddr() #0 {
	%alloca = alloca [16 x i32], align 4, addrspace(5)			%alloca = alloca [16 x i32], align 4, addrspace(5)
	%vaddr = load volatile i32, i32 addrspace(1)* undef			%vaddr = load volatile i32, i32 addrspace(1)* undef
	%vaddr.off = add i32 %vaddr, 8			%vaddr.off = add i32 %vaddr, 8
	%gep = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %vaddr.off			%gep = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %vaddr.off
	store volatile i32 9, i32 addrspace(5)* %gep			store volatile i32 9, i32 addrspace(5)* %gep
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-mesa-mesa3d -run-pass=si-optimize-exec-masking-pre-ra -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s			# RUN: llc -mtriple=amdgcn-mesa-mesa3d -run-pass=si-optimize-exec-masking-pre-ra -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

	# Check for regression from assuming an instruction was a copy after			# Check for regression from assuming an instruction was a copy after
	# dropping the opcode check.			# dropping the opcode check.
	---			---
	name: exec_src1_is_not_copy			name: exec_src1_is_not_copy
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr101'
	frameOffsetReg: '$sgpr101'			frameOffsetReg: '$sgpr101'
	body: \|			body: \|
	; GCN-LABEL: name: exec_src1_is_not_copy			; GCN-LABEL: name: exec_src1_is_not_copy
	; GCN: bb.0:			; GCN: bb.0:
	; GCN: successors: %bb.1(0x40000000), %bb.2(0x40000000)			; GCN: successors: %bb.1(0x40000000), %bb.2(0x40000000)
	; GCN: liveins: $vgpr0			; GCN: liveins: $vgpr0
	; GCN: [[COPY:%[0-9]+]]:sreg_64 = COPY $exec			; GCN: [[COPY:%[0-9]+]]:sreg_64 = COPY $exec
	; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF			; GCN: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
	▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=VGPR -check-prefix=GCN %s		; RUN: llc -O0 -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=VGPR -check-prefix=GCN %s

; FIXME: we should disable sdwa peephole because dead-code elimination, that		; FIXME: we should disable sdwa peephole because dead-code elimination, that
; runs after peephole, ruins this test (different register numbers)		; runs after peephole, ruins this test (different register numbers)

; Spill all SGPRs so multiple VGPRs are required for spilling all of them.		; Spill all SGPRs so multiple VGPRs are required for spilling all of them.

; Ideally we only need 2 VGPRs for all spilling. The VGPRs are		; Ideally we only need 2 VGPRs for all spilling. The VGPRs are
; allocated per-frame index, so it's possible to get up with more.		; allocated per-frame index, so it's possible to get up with more.

; GCN-LABEL: {{^}}spill_sgprs_to_multiple_vgprs:		; GCN-LABEL: {{^}}spill_sgprs_to_multiple_vgprs:

; GCN: def s[4:11]		; GCN: def s[4:11]
; GCN: def s[12:19]		; GCN: def s[12:19]
; GCN: def s[20:27]		; GCN: def s[20:27]
; GCN: def s[28:35]
; GCN: def s[36:43]		; GCN: def s[36:43]
; GCN: def s[44:51]		; GCN: def s[44:51]
; GCN: def s[52:59]		; GCN: def s[52:59]
; GCN: def s[60:67]		; GCN: def s[60:67]
; GCN: def s[68:75]		; GCN: def s[68:75]
; GCN: def s[76:83]		; GCN: def s[76:83]
; GCN: def s[84:91]		; GCN: def s[84:91]

; GCN: v_writelane_b32 v0, s4, 0		; GCN: v_writelane_b32 v0, s4, 0
; GCN-NEXT: v_writelane_b32 v0, s5, 1		; GCN-NEXT: v_writelane_b32 v0, s5, 1
; GCN-NEXT: v_writelane_b32 v0, s6, 2		; GCN-NEXT: v_writelane_b32 v0, s6, 2
; GCN-NEXT: v_writelane_b32 v0, s7, 3		; GCN-NEXT: v_writelane_b32 v0, s7, 3
; GCN-NEXT: v_writelane_b32 v0, s8, 4		; GCN-NEXT: v_writelane_b32 v0, s8, 4
; GCN-NEXT: v_writelane_b32 v0, s9, 5		; GCN-NEXT: v_writelane_b32 v0, s9, 5
; GCN-NEXT: v_writelane_b32 v0, s10, 6		; GCN-NEXT: v_writelane_b32 v0, s10, 6
; GCN-NEXT: v_writelane_b32 v0, s11, 7		; GCN-NEXT: v_writelane_b32 v0, s11, 7

; GCN: def s{{\[}}[[TMP_LO:[0-9]+]]:[[TMP_HI:[0-9]+]]{{\]}}		; GCN: def s{{\[}}[[TMP_LO:[0-9]+]]:[[TMP_HI:[0-9]+]]{{\]}}
; GCN: v_writelane_b32 v0, s[[TMP_LO]], 8		; GCN: v_writelane_b32 v0, s[[TMP_LO]], 8
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 9		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 9
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 10		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 10
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 11		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 11
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 12		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 12
; GCN-NEXT: v_writelane_b32 v0, s9, 13		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 13
; GCN-NEXT: v_writelane_b32 v0, s10, 14		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 14
; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 15		; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 15

; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}		; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}
; GCN: v_writelane_b32 v0, s[[TMP_LO]], 16		; GCN: v_writelane_b32 v0, s[[TMP_LO]], 16
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 17		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 17
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 18		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 18
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 19		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 19
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 20		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 20
; GCN-NEXT: v_writelane_b32 v0, s9, 21		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 21
; GCN-NEXT: v_writelane_b32 v0, s10, 22		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 22
; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 23		; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 23

; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}		; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}
; GCN: v_writelane_b32 v0, s[[TMP_LO]], 24		; GCN: v_writelane_b32 v0, s[[TMP_LO]], 24
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 25		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 25
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 26		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 26
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 27		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 27
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 28		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 28
; GCN-NEXT: v_writelane_b32 v0, s9, 29		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 29
; GCN-NEXT: v_writelane_b32 v0, s10, 30		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 30
; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 31		; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 31

; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}		; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}
; GCN: v_writelane_b32 v0, s[[TMP_LO]], 32		; GCN: v_writelane_b32 v0, s[[TMP_LO]], 32
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 33		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 33
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 34		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 34
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 35		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 35
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 36		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 36
; GCN-NEXT: v_writelane_b32 v0, s9, 37		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 37
; GCN-NEXT: v_writelane_b32 v0, s10, 38		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 38
; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 39		; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 39

; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}		; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}
; GCN: v_writelane_b32 v0, s[[TMP_LO]], 40		; GCN: v_writelane_b32 v0, s[[TMP_LO]], 40
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 41		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 41
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 42		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 42
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 43		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 43
; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 44		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 44
; GCN-NEXT: v_writelane_b32 v0, s9, 45		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 45
; GCN-NEXT: v_writelane_b32 v0, s10, 46		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 46
; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 47		; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 47

; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}		; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}
; GCN: v_writelane_b32 v0, s12, 48		; GCN: v_writelane_b32 v0, s[[TMP_LO]], 48
; GCN-NEXT: v_writelane_b32 v0, s13, 49		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 49
; GCN-NEXT: v_writelane_b32 v0, s14, 50		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 50
; GCN-NEXT: v_writelane_b32 v0, s15, 51		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 51
; GCN-NEXT: v_writelane_b32 v0, s16, 52		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 52
; GCN-NEXT: v_writelane_b32 v0, s17, 53		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 53
; GCN-NEXT: v_writelane_b32 v0, s18, 54		; GCN-NEXT: v_writelane_b32 v0, s{{[0-9]+}}, 54
; GCN-NEXT: v_writelane_b32 v0, s19, 55		; GCN-NEXT: v_writelane_b32 v0, s[[TMP_HI]], 55

; GCN-NEXT: v_writelane_b32 v0, s20, 56		; GCN: def s{{\[}}[[TMP_LO]]:[[TMP_HI]]{{\]}}
; GCN-NEXT: v_writelane_b32 v0, s21, 57		; GCN: v_writelane_b32 v0, s12, 56
; GCN-NEXT: v_writelane_b32 v0, s22, 58		; GCN-NEXT: v_writelane_b32 v0, s13, 57
; GCN-NEXT: v_writelane_b32 v0, s23, 59		; GCN-NEXT: v_writelane_b32 v0, s14, 58
; GCN-NEXT: v_writelane_b32 v0, s24, 60		; GCN-NEXT: v_writelane_b32 v0, s15, 59
; GCN-NEXT: v_writelane_b32 v0, s25, 61		; GCN-NEXT: v_writelane_b32 v0, s16, 60
; GCN-NEXT: v_writelane_b32 v0, s26, 62		; GCN-NEXT: v_writelane_b32 v0, s17, 61
; GCN-NEXT: v_writelane_b32 v0, s27, 63		; GCN-NEXT: v_writelane_b32 v0, s18, 62
; GCN-NEXT: v_writelane_b32 v1, s28, 0		; GCN-NEXT: v_writelane_b32 v0, s19, 63
; GCN-NEXT: v_writelane_b32 v1, s29, 1
; GCN-NEXT: v_writelane_b32 v1, s30, 2		; GCN-NEXT: v_writelane_b32 v1, s20, 0
; GCN-NEXT: v_writelane_b32 v1, s31, 3		; GCN-NEXT: v_writelane_b32 v1, s21, 1
; GCN-NEXT: v_writelane_b32 v1, s32, 4		; GCN-NEXT: v_writelane_b32 v1, s22, 2
; GCN-NEXT: v_writelane_b32 v1, s33, 5		; GCN-NEXT: v_writelane_b32 v1, s23, 3
; GCN-NEXT: v_writelane_b32 v1, s34, 6		; GCN-NEXT: v_writelane_b32 v1, s24, 4
; GCN-NEXT: v_writelane_b32 v1, s35, 7		; GCN-NEXT: v_writelane_b32 v1, s25, 5
		; GCN-NEXT: v_writelane_b32 v1, s26, 6
		; GCN-NEXT: v_writelane_b32 v1, s27, 7
; GCN-NEXT: v_writelane_b32 v1, s36, 8		; GCN-NEXT: v_writelane_b32 v1, s36, 8
; GCN-NEXT: v_writelane_b32 v1, s37, 9		; GCN-NEXT: v_writelane_b32 v1, s37, 9
; GCN-NEXT: v_writelane_b32 v1, s38, 10		; GCN-NEXT: v_writelane_b32 v1, s38, 10
; GCN-NEXT: v_writelane_b32 v1, s39, 11		; GCN-NEXT: v_writelane_b32 v1, s39, 11
; GCN-NEXT: v_writelane_b32 v1, s40, 12		; GCN-NEXT: v_writelane_b32 v1, s40, 12
; GCN-NEXT: v_writelane_b32 v1, s41, 13		; GCN-NEXT: v_writelane_b32 v1, s41, 13
; GCN-NEXT: v_writelane_b32 v1, s42, 14		; GCN-NEXT: v_writelane_b32 v1, s42, 14
; GCN-NEXT: v_writelane_b32 v1, s43, 15		; GCN-NEXT: v_writelane_b32 v1, s43, 15
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 2		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 2
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 3		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 3
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 4		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 4
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 5		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 5
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 6		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 6
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v0, 7		; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v0, 7
; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}		; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v0, 48
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 49
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 50
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 51
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 52
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 53
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 54
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v0, 55
; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v0, 56		; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v0, 56
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 57		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 57
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 58		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 58
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 59		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 59
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 60		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 60
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 61		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 61
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 62		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v0, 62
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v0, 63		; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v0, 63
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	ret:
ret void		ret void
}		}

; Some of the lanes of an SGPR spill are in one VGPR and some forced		; Some of the lanes of an SGPR spill are in one VGPR and some forced
; into the next available VGPR.		; into the next available VGPR.

; GCN-LABEL: {{^}}split_sgpr_spill_2_vgprs:		; GCN-LABEL: {{^}}split_sgpr_spill_2_vgprs:
; GCN: def s[4:19]		; GCN: def s[4:19]
; GCN: def s[20:35]		; GCN: def s[36:51]

; GCN: v_writelane_b32 v0, s4, 48		; GCN: v_writelane_b32 v0, s4, 48
; GCN-NEXT: v_writelane_b32 v0, s5, 49		; GCN-NEXT: v_writelane_b32 v0, s5, 49
; GCN-NEXT: v_writelane_b32 v0, s6, 50		; GCN-NEXT: v_writelane_b32 v0, s6, 50
; GCN-NEXT: v_writelane_b32 v0, s7, 51		; GCN-NEXT: v_writelane_b32 v0, s7, 51
; GCN-NEXT: v_writelane_b32 v0, s8, 52		; GCN-NEXT: v_writelane_b32 v0, s8, 52
; GCN-NEXT: v_writelane_b32 v0, s9, 53		; GCN-NEXT: v_writelane_b32 v0, s9, 53
; GCN-NEXT: v_writelane_b32 v0, s10, 54		; GCN-NEXT: v_writelane_b32 v0, s10, 54
; GCN-NEXT: v_writelane_b32 v0, s11, 55		; GCN-NEXT: v_writelane_b32 v0, s11, 55
; GCN-NEXT: v_writelane_b32 v0, s12, 56		; GCN-NEXT: v_writelane_b32 v0, s12, 56
; GCN-NEXT: v_writelane_b32 v0, s13, 57		; GCN-NEXT: v_writelane_b32 v0, s13, 57
; GCN-NEXT: v_writelane_b32 v0, s14, 58		; GCN-NEXT: v_writelane_b32 v0, s14, 58
; GCN-NEXT: v_writelane_b32 v0, s15, 59		; GCN-NEXT: v_writelane_b32 v0, s15, 59
; GCN-NEXT: v_writelane_b32 v0, s16, 60		; GCN-NEXT: v_writelane_b32 v0, s16, 60
; GCN-NEXT: v_writelane_b32 v0, s17, 61		; GCN-NEXT: v_writelane_b32 v0, s17, 61
; GCN-NEXT: v_writelane_b32 v0, s18, 62		; GCN-NEXT: v_writelane_b32 v0, s18, 62
; GCN-NEXT: v_writelane_b32 v0, s19, 63		; GCN-NEXT: v_writelane_b32 v0, s19, 63

; GCN: v_readlane_b32 s4, v0, 48		; GCN: v_readlane_b32 s0, v0, 48
; GCN-NEXT: v_readlane_b32 s5, v0, 49		; GCN-NEXT: v_readlane_b32 s1, v0, 49
; GCN-NEXT: v_readlane_b32 s6, v0, 50		; GCN-NEXT: v_readlane_b32 s2, v0, 50
; GCN-NEXT: v_readlane_b32 s7, v0, 51		; GCN-NEXT: v_readlane_b32 s3, v0, 51
; GCN-NEXT: v_readlane_b32 s8, v0, 52		; GCN-NEXT: v_readlane_b32 s4, v0, 52
; GCN-NEXT: v_readlane_b32 s9, v0, 53		; GCN-NEXT: v_readlane_b32 s5, v0, 53
; GCN-NEXT: v_readlane_b32 s10, v0, 54		; GCN-NEXT: v_readlane_b32 s6, v0, 54
; GCN-NEXT: v_readlane_b32 s11, v0, 55		; GCN-NEXT: v_readlane_b32 s7, v0, 55
; GCN-NEXT: v_readlane_b32 s12, v0, 56		; GCN-NEXT: v_readlane_b32 s8, v0, 56
; GCN-NEXT: v_readlane_b32 s13, v0, 57		; GCN-NEXT: v_readlane_b32 s9, v0, 57
; GCN-NEXT: v_readlane_b32 s14, v0, 58		; GCN-NEXT: v_readlane_b32 s10, v0, 58
; GCN-NEXT: v_readlane_b32 s15, v0, 59		; GCN-NEXT: v_readlane_b32 s11, v0, 59
; GCN-NEXT: v_readlane_b32 s16, v0, 60		; GCN-NEXT: v_readlane_b32 s12, v0, 60
; GCN-NEXT: v_readlane_b32 s17, v0, 61		; GCN-NEXT: v_readlane_b32 s13, v0, 61
; GCN-NEXT: v_readlane_b32 s18, v0, 62		; GCN-NEXT: v_readlane_b32 s14, v0, 62
; GCN-NEXT: v_readlane_b32 s19, v0, 63		; GCN-NEXT: v_readlane_b32 s15, v0, 63
define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {		define amdgpu_kernel void @split_sgpr_spill_2_vgprs(i32 addrspace(1)* %out, i32 %in) #1 {
%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr5 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr3 = call <8 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0

Show All 13 Lines	ret:
ret void		ret void
}		}

; The first 64 SGPR spills can go to a VGPR, but there isn't a second		; The first 64 SGPR spills can go to a VGPR, but there isn't a second
; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.		; so some spills must be to memory. The last 16 element spill runs out of lanes at the 15th element.

; GCN-LABEL: {{^}}no_vgprs_last_sgpr_spill:		; GCN-LABEL: {{^}}no_vgprs_last_sgpr_spill:

; GCN: v_writelane_b32 v23, s{{[0-9]+}}, 0		; GCN: v_writelane_b32 v31, s{{[0-9]+}}, 0
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 1		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 1
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 2		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 2
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 3		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 3
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 4		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 4
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 5		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 5
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 6		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 6
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 7		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 7
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 8		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 8
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 9		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 9
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 10		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 10
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 11		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 11
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 12		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 12
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 13		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 13
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 14		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 14
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 15		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 15

; GCN: v_writelane_b32 v23, s{{[0-9]+}}, 16		; GCN: v_writelane_b32 v31, s{{[0-9]+}}, 16
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 17		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 17
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 18		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 18
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 19		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 19
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 20		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 20
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 21		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 21
; GCN-NEXT: v_writelane_b32 v23, s{{[0-9]+}}, 22		; GCN-NEXT: v_writelane_b32 v31, s{{[0-9]+}}, 22
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 23		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 23
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 24		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 24
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 25		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 25
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 26		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 26
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 27		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 27
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 28		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 28
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 29		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 29
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 30		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 30
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 31		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 31

; GCN: def s[0:1]		; GCN: def s[0:1]
; GCN: v_writelane_b32 v23, s20, 32		; GCN: v_writelane_b32 v31, s{{[[0-9]+}}, 32
; GCN-NEXT: v_writelane_b32 v23, s21, 33		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 33
		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 34
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 34		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 35
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 35		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 36
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 36		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 37
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 37		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 38
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 38		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 39
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 39		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 40
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 40		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 41
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 41		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 42
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 42		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 43
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 43		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 44
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 44		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 45
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 45		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 46
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 46		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 47
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 47		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 48
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 48		; GCN-NEXT: v_writelane_b32 v31, s{{[[0-9]+}}, 49
; GCN-NEXT: v_writelane_b32 v23, s{{[[0-9]+}}, 49

; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0
; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0
; GCN: s_cbranch_scc1		; GCN: s_cbranch_scc1


; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 0		; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v31, 0
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 1		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 1
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 2		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 2
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 3		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 3
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 4		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 4
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 5		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 5
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 6		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 6
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 7		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 7
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 8		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 8
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 9		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 9
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 10		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 10
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 11		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 11
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 12		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 12
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 13		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 13
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 14		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 14
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 15		; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v31, 15
; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}		; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}


; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 32		; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v31, 32
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 33		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 33
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 34		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 34
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 35		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 35
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 36		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 36
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 37		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 37
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 38		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 38
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 39		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 39
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 40		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 40
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 41		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 41
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 42		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 42
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 43		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 43
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 44		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 44
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 45		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 45
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 46		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 46
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 47		; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v31, 47
; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}		; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v23, 16		; GCN: v_readlane_b32 s[[USE_TMP_LO:[0-9]+]], v31, 16
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 17		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 17
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 18		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 18
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 19		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 19
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 20		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 20
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 21		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 21
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 22		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 22
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 23		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 23
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 24		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 24
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 25		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 25
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 26		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 26
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 27		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 27
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 28		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 28
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 29		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 29
; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v23, 30		; GCN-NEXT: v_readlane_b32 s{{[0-9]+}}, v31, 30
; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v23, 31		; GCN-NEXT: v_readlane_b32 s[[USE_TMP_HI:[0-9]+]], v31, 31
; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}		; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}

; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}		; GCN: buffer_load_dword v[[RESTORE_TMP:[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0
; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}		; GCN: v_readfirstlane_b32 s[[USE_TMP_LO:[0-9]+]], v[[RESTORE_TMP]]
		; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0
; GCN: v_readfirstlane_b32 s1, v0		; GCN: v_readfirstlane_b32 s[[USE_TMP_HI:[0-9]+]], v[[RESTORE_TMP]]
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: ; use s[0:1]		; GCN: ; use s{{\[}}[[USE_TMP_LO]]:[[USE_TMP_HI]]{{\]}}
define amdgpu_kernel void @no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {		define amdgpu_kernel void @no_vgprs_last_sgpr_spill(i32 addrspace(1)* %out, i32 %in) #1 {
call void asm sideeffect "", "~{v[0:7]}" () #0		call void asm sideeffect "", "~{v[0:7]}" () #0
call void asm sideeffect "", "~{v[8:15]}" () #0		call void asm sideeffect "", "~{v[8:15]}" () #0
call void asm sideeffect "", "~{v[16:19]}"() #0		call void asm sideeffect "", "~{v[16:23]}" () #0
call void asm sideeffect "", "~{v[20:21]}"() #0		call void asm sideeffect "", "~{v[24:27]}"() #0
call void asm sideeffect "", "~{v22}"() #0		call void asm sideeffect "", "~{v[28:29]}"() #0
		call void asm sideeffect "", "~{v30}"() #0

%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr0 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr1 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr2 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr3 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr3 = call <16 x i32> asm sideeffect "; def $0", "=s" () #0
%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0		%wide.sgpr4 = call <2 x i32> asm sideeffect "; def $0", "=s" () #0
%cmp = icmp eq i32 %in, 0		%cmp = icmp eq i32 %in, 0
br i1 %cmp, label %bb0, label %ret		br i1 %cmp, label %bb0, label %ret

bb0:		bb0:
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr0) #0		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr0) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr1) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr2) #0
call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr3) #0		call void asm sideeffect "; use $0", "s"(<16 x i32> %wide.sgpr3) #0
call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0		call void asm sideeffect "; use $0", "s"(<2 x i32> %wide.sgpr4) #0
br label %ret		br label %ret

ret:		ret:
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }
attributes #1 = { nounwind "amdgpu-waves-per-eu"="10,10" }		attributes #1 = { nounwind "amdgpu-waves-per-eu"="8,8" }

llvm/test/CodeGen/AMDGPU/pei-reg-scavenger-position.mir

	Show All 11 Lines

	# Force a frame larger than the immediate field with a large alignment.			# Force a frame larger than the immediate field with a large alignment.
	stack:			stack:
	- { id: 0, type: default, offset: 4096, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 4096, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr33
	frameOffsetReg: $sgpr5
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
				argumentInfo:
				privateSegmentWaveByteOffset: { reg: '$sgpr4' }

	body: \|			body: \|
	; CHECK-LABEL: name: scavenge_register_position			; CHECK-LABEL: name: scavenge_register_position
	; CHECK: bb.0:			; CHECK: bb.0:
	; CHECK: successors: %bb.1(0x80000000)			; CHECK: successors: %bb.1(0x80000000)
	; CHECK: liveins: $sgpr33, $sgpr0_sgpr1_sgpr2_sgpr3			; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4
	; CHECK: $sgpr4 = S_ADD_U32 $sgpr32, 524288, implicit-def $scc			; CHECK: $sgpr0 = S_ADD_U32 $sgpr0, $sgpr4, implicit-def $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
	; CHECK: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.0, align 8192, addrspace 5)			; CHECK: $sgpr1 = S_ADDC_U32 $sgpr1, 0, implicit-def $scc, implicit $scc, implicit-def $sgpr0_sgpr1_sgpr2_sgpr3
				; CHECK: $sgpr5 = S_MOV_B32 524288
				; CHECK: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.0, align 8192, addrspace 5)
	; CHECK: S_BRANCH %bb.1			; CHECK: S_BRANCH %bb.1
	; CHECK: bb.1:			; CHECK: bb.1:
	; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3			; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3
	; CHECK: $sgpr4 = S_ADD_U32 $sgpr32, 524288, implicit-def $scc			; CHECK: $sgpr4 = S_MOV_B32 524288
	; CHECK: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.0, align 8192, addrspace 5)			; CHECK: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.0, align 8192, addrspace 5)
	; CHECK: S_ENDPGM 0, implicit $vgpr0			; CHECK: S_ENDPGM 0, implicit $vgpr0
	bb.0:			bb.0:
	$vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			$vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	S_BRANCH %bb.1			S_BRANCH %bb.1

	bb.1:			bb.1:
	$vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			$vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	S_ENDPGM 0, implicit $vgpr0			S_ENDPGM 0, implicit $vgpr0
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-carry-out.mir

	Show All 13 Lines

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34			frameOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs			; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr34
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr34 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr33 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $sgpr34 = S_LSHR_B32 $sgpr34, 6, implicit-def $scc
	; CHECK: $sgpr33 = S_LSHR_B32 killed $sgpr33, 6, implicit-def $scc			; CHECK: $sgpr34 = S_ADD_U32 killed $sgpr34, 8192, implicit-def $scc
	; CHECK: $sgpr33 = S_ADD_U32 killed $sgpr33, 8192, implicit-def $scc			; CHECK: $vgpr2 = COPY killed $sgpr34
	; CHECK: $vgpr2 = COPY killed $sgpr33			; CHECK: $sgpr34 = S_SUB_U32 killed $sgpr34, 8192, implicit-def $scc
	; CHECK: $sgpr33 = S_SUB_U32 killed $sgpr33, 8192, implicit-def $scc			; CHECK: $sgpr34 = S_LSHL_B32 $sgpr34, 6, implicit-def $scc
	; CHECK: $sgpr33 = S_LSHL_B32 killed $sgpr33, 6, implicit-def $scc
	; CHECK: $sgpr33 = S_ADD_U32 $sgpr33, $sgpr34, implicit-def $scc
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr34 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

	# One 32-bit SGPR is available for the intermediate scale computation,			# One 32-bit SGPR is available for the intermediate scale computation,
	# so only an extra copy to VALU is necessary.			# so only an extra copy to VALU is necessary.

	---			---
	name: scavenge_sgpr_pei_one_sgpr			name: scavenge_sgpr_pei_one_sgpr
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34			frameOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr			; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr34
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr34 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr29 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $sgpr29 = S_LSHR_B32 $sgpr34, 6, implicit-def $scc
	; CHECK: $sgpr29 = S_LSHR_B32 killed $sgpr29, 6, implicit-def $scc
	; CHECK: $sgpr29 = S_ADD_U32 killed $sgpr29, 8192, implicit-def $scc			; CHECK: $sgpr29 = S_ADD_U32 killed $sgpr29, 8192, implicit-def $scc
	; CHECK: $vgpr2 = COPY killed $sgpr29			; CHECK: $vgpr2 = COPY killed $sgpr29
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr34 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

	# When only one 64-bit SGPR is available for the unused carry out pre gfx9,			# When only one 64-bit SGPR is available for the unused carry out pre gfx9,
	# we must reuse one of the 32-bit SGPR sub-regs to materialize the offset.			# we must reuse one of the 32-bit SGPR sub-regs to materialize the offset.

	---			---
	name: scavenge_sgpr_pei_one_sgpr_64			name: scavenge_sgpr_pei_one_sgpr_64
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34			frameOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr_64			; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr_64
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr34
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr34 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr28 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr34, implicit $exec
	; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, killed $sgpr28, implicit $exec
	; CHECK: $sgpr28 = S_MOV_B32 8192			; CHECK: $sgpr28 = S_MOV_B32 8192
	; CHECK: $vgpr2, dead $sgpr28_sgpr29 = V_ADD_I32_e64 killed $sgpr28, killed $vgpr3, 0, implicit $exec			; CHECK: $vgpr2, dead $sgpr28_sgpr29 = V_ADD_I32_e64 killed $sgpr28, killed $vgpr3, 0, implicit $exec
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr34 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

	# Prefer to use vcc as unused carry out.			# Prefer to use vcc as unused carry out.

	---			---
	name: scavenge_sgpr_pei_prefer_vcc			name: scavenge_sgpr_pei_prefer_vcc
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34			frameOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_prefer_vcc			; CHECK-LABEL: name: scavenge_sgpr_pei_prefer_vcc
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr34
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr34 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31
	; CHECK: $vcc_hi = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr34, implicit $exec
	; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, killed $vcc_hi, implicit $exec
	; CHECK: $vcc_lo = S_MOV_B32 8192			; CHECK: $vcc_lo = S_MOV_B32 8192
	; CHECK: $vgpr2, dead $vcc = V_ADD_I32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec			; CHECK: $vgpr2, dead $vcc = V_ADD_I32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr34 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0			; CHECK: S_ENDPGM 0
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr30, implicit-def $sgpr31
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr31
	S_ENDPGM 0			S_ENDPGM 0
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s

	# Test what happens when an SGPR is unavailable for the unused add. The non-inline constant needs to be folded into the add instruction and not materialized in a register.			# Test what happens when an SGPR is unavailable for the unused add. The non-inline constant needs to be folded into the add instruction and not materialized in a register.

	---			---
	name: scavenge_sgpr_pei_no_sgprs			name: scavenge_sgpr_pei_no_sgprs
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33			frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs			; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr33
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr33 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc			; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, killed $sgpr33, implicit $exec
	; CHECK: $vgpr2 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec			; CHECK: $vgpr2 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec
	; CHECK: $sgpr33 = S_ADD_U32 $sgpr33, $sgpr34, implicit-def $scc
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr33 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s

	# Frame virtual SGPRs should not be used, as the register scavenger cannot usefully spill them anymore.			# Frame virtual SGPRs should not be used, as the register scavenger cannot usefully spill them anymore.
	# Spilling is also worse than increment and restore of a frame register. There should be no spills remaining.			# Spilling is also worse than increment and restore of a frame register. There should be no spills remaining.

	---			---
	name: scavenge_sgpr_pei			name: scavenge_sgpr_pei
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, size: 4, alignment: 4096 }			- { id: 0, type: default, size: 4, alignment: 4096 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr34
	frameOffsetReg: $sgpr33			frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei			; CHECK-LABEL: name: scavenge_sgpr_pei
	; CHECK: liveins: $vgpr1			; CHECK: liveins: $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; CHECK: $sgpr27 = frame-setup COPY $sgpr33
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 262080, implicit-def $scc			; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 262080, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294705152, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294705152, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 524288, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 524288, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $sgpr33 = S_SUB_U32 $sgpr33, $sgpr34, implicit-def $scc
	; CHECK: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; CHECK: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $sgpr33 = S_ADD_U32 $sgpr33, $sgpr34, implicit-def $scc
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 524288, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 524288, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; CHECK: $sgpr33 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.0, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.0, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

llvm/test/CodeGen/AMDGPU/private-access-no-objects.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI -check-prefix=OPT %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=iceland -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=iceland -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=VI -check-prefix=OPT %s
	; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=OPTNONE %s			; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=OPTNONE %s

	; There are no stack objects, but still a private memory access. The			; There are no stack objects, but still a private memory access. The
	; private access regiters need to be correctly initialized anyway, and			; private access regiters need to be correctly initialized anyway, and
	; shifted down to the end of the used registers.			; shifted down to the end of the used registers.

	; GCN-LABEL: {{^}}store_to_undef:			; GCN-LABEL: {{^}}store_to_undef:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s5{{$}}			; OPT: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, 0 offen{{$}}
	; OPT: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offen{{$}}

	; -O0 should assume spilling, so the input scratch resource descriptor			; -O0 should assume spilling, so the input scratch resource descriptor
	; -should be used directly without any copies.			; -should be used directly without any copies.

	; OPTNONE-NOT: s_mov_b32			; OPTNONE-NOT: s_mov_b32
	; OPTNONE: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s5 offen{{$}}			; OPTNONE: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	define amdgpu_kernel void @store_to_undef() #0 {			define amdgpu_kernel void @store_to_undef() #0 {
	store volatile i32 0, i32 addrspace(5)* undef			store volatile i32 0, i32 addrspace(5)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}store_to_inttoptr:			; GCN-LABEL: {{^}}store_to_inttoptr:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s5{{$}}			; OPT: buffer_store_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, 0 offset:124{{$}}
	; OPT: buffer_store_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offset:124{{$}}
	define amdgpu_kernel void @store_to_inttoptr() #0 {			define amdgpu_kernel void @store_to_inttoptr() #0 {
	store volatile i32 0, i32 addrspace(5)* inttoptr (i32 124 to i32 addrspace(5)*)			store volatile i32 0, i32 addrspace(5)* inttoptr (i32 124 to i32 addrspace(5)*)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_from_undef:			; GCN-LABEL: {{^}}load_from_undef:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s5{{$}}			; OPT: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, 0 offen{{$}}
	; OPT: buffer_load_dword v{{[0-9]+}}, v{{[0-9]+}}, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offen{{$}}
	define amdgpu_kernel void @load_from_undef() #0 {			define amdgpu_kernel void @load_from_undef() #0 {
	%ld = load volatile i32, i32 addrspace(5)* undef			%ld = load volatile i32, i32 addrspace(5)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}load_from_inttoptr:			; GCN-LABEL: {{^}}load_from_inttoptr:
	; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]			; OPT-DAG: s_mov_b64 s{{\[}}[[RSRC_LO:[0-9]+]]:{{[0-9]+\]}}, s[0:1]
	; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]			; OPT-DAG: s_mov_b64 s{{\[[0-9]+}}:[[RSRC_HI:[0-9]+]]{{\]}}, s[2:3]
	; OPT-DAG: s_mov_b32 [[SOFFSET:s[0-9]+]], s5{{$}}			; OPT: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, 0 offset:124{{$}}
	; OPT: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[RSRC_LO]]:[[RSRC_HI]]{{\]}}, [[SOFFSET]] offset:124{{$}}
	define amdgpu_kernel void @load_from_inttoptr() #0 {			define amdgpu_kernel void @load_from_inttoptr() #0 {
	%ld = load volatile i32, i32 addrspace(5)* inttoptr (i32 124 to i32 addrspace(5)*)			%ld = load volatile i32, i32 addrspace(5)* inttoptr (i32 124 to i32 addrspace(5)*)
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/private-element-size.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-16 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT16 -check-prefix=HSA -check-prefix=HSA-ELT16 -check-prefix=ALL -check-prefix=HSA_ELTGE8 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-16 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT16 -check-prefix=HSA -check-prefix=HSA-ELT16 -check-prefix=ALL -check-prefix=HSA_ELTGE8 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-8 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT8 -check-prefix=HSA -check-prefix=HSA-ELT8 -check-prefix=ALL -check-prefix=HSA-ELTGE8 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-8 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT8 -check-prefix=HSA -check-prefix=HSA-ELT8 -check-prefix=ALL -check-prefix=HSA-ELTGE8 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-4 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT4 -check-prefix=HSA -check-prefix=HSA-ELT4 -check-prefix=ALL %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-unknown-amdhsa -mattr=-promote-alloca,+max-private-element-size-4 -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefix=ELT4 -check-prefix=HSA -check-prefix=HSA-ELT4 -check-prefix=ALL %s


	; ALL-LABEL: {{^}}private_elt_size_v4i32:			; ALL-LABEL: {{^}}private_elt_size_v4i32:

	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1


	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}

	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:40			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:40

	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen
	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:20{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:20{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:28{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:32{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:32{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:36{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:36{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:40{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:40{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:44{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:44{{$}}

	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:8{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:12{{$}}
	define amdgpu_kernel void @private_elt_size_v4i32(<4 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_v4i32(<4 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x <4 x i32>], align 16, addrspace(5)			%alloca = alloca [2 x <4 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 0
	%gep1 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 1			%gep1 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 1
	store <4 x i32> zeroinitializer, <4 x i32> addrspace(5)* %gep0			store <4 x i32> zeroinitializer, <4 x i32> addrspace(5)* %gep0
	store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep1			store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep1
	%gep2 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 %index			%gep2 = getelementptr inbounds [2 x <4 x i32>], [2 x <4 x i32>] addrspace(5)* %alloca, i32 0, i32 %index
	%load = load <4 x i32>, <4 x i32> addrspace(5)* %gep2			%load = load <4 x i32>, <4 x i32> addrspace(5)* %gep2
	store <4 x i32> %load, <4 x i32> addrspace(1)* %out			store <4 x i32> %load, <4 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

	; ALL-LABEL: {{^}}private_elt_size_v8i32:			; ALL-LABEL: {{^}}private_elt_size_v8i32:
	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1

	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:48			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:48
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:64			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:64
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:80			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:80

	; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}


	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:40			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:40
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:48			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:48
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:56			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:56
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:88			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:88
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:80			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:80
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:72			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:72
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:64			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:64

	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen
	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:32{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:32{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:36{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:36{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:40{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:40{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:44{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:44{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:48{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:48{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:52{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:52{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:56{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:56{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:60{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:60{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:64{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:64{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:68{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:68{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:72{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:72{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:76{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:76{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:80{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:80{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:84{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:84{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:88{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:88{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:92{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:92{{$}}

	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:8{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:12{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:16{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:16{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:20{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:20{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:24{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:24{{$}}
	; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:28{{$}}			; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:28{{$}}
	define amdgpu_kernel void @private_elt_size_v8i32(<8 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_v8i32(<8 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x <8 x i32>], align 16, addrspace(5)			%alloca = alloca [2 x <8 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 0
	%gep1 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 1			%gep1 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 1
	store <8 x i32> zeroinitializer, <8 x i32> addrspace(5)* %gep0			store <8 x i32> zeroinitializer, <8 x i32> addrspace(5)* %gep0
	store <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>, <8 x i32> addrspace(5)* %gep1			store <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>, <8 x i32> addrspace(5)* %gep1
	%gep2 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 %index			%gep2 = getelementptr inbounds [2 x <8 x i32>], [2 x <8 x i32>] addrspace(5)* %alloca, i32 0, i32 %index
	%load = load <8 x i32>, <8 x i32> addrspace(5)* %gep2			%load = load <8 x i32>, <8 x i32> addrspace(5)* %gep2
	store <8 x i32> %load, <8 x i32> addrspace(1)* %out			store <8 x i32> %load, <8 x i32> addrspace(1)* %out
	ret void			ret void
	}			}


	; ALL-LABEL: {{^}}private_elt_size_i64:			; ALL-LABEL: {{^}}private_elt_size_i64:
	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1

	; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off\|v[0-9]}}, s[0:3], s9 offset:1			; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off\|v[0-9]}}, s[0:3], 0 offset:1
	; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off\|v[0-9]}}, s[0:3], s9 offset:2			; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off\|v[0-9]}}, s[0:3], 0 offset:2

	; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:20{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:20{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:28{{$}}

	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	define amdgpu_kernel void @private_elt_size_i64(i64 addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_i64(i64 addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x i64], align 16, addrspace(5)			%alloca = alloca [2 x i64], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 0
	%gep1 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 1			%gep1 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 1
	store i64 0, i64 addrspace(5)* %gep0			store i64 0, i64 addrspace(5)* %gep0
	store i64 34359738602, i64 addrspace(5)* %gep1			store i64 34359738602, i64 addrspace(5)* %gep1
	%gep2 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 %index			%gep2 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(5)* %alloca, i32 0, i32 %index
	%load = load i64, i64 addrspace(5)* %gep2			%load = load i64, i64 addrspace(5)* %gep2
	store i64 %load, i64 addrspace(1)* %out			store i64 %load, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; ALL-LABEL: {{^}}private_elt_size_f64:			; ALL-LABEL: {{^}}private_elt_size_f64:
	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1

	; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16			; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16
	; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:24			; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:24

	; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:20{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:20{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:28{{$}}

	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	define amdgpu_kernel void @private_elt_size_f64(double addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_f64(double addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x double], align 16, addrspace(5)			%alloca = alloca [2 x double], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 0
	%gep1 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 1			%gep1 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 1
	store double 0.0, double addrspace(5)* %gep0			store double 0.0, double addrspace(5)* %gep0
	store double 4.0, double addrspace(5)* %gep1			store double 4.0, double addrspace(5)* %gep1
	%gep2 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 %index			%gep2 = getelementptr inbounds [2 x double], [2 x double] addrspace(5)* %alloca, i32 0, i32 %index
	%load = load double, double addrspace(5)* %gep2			%load = load double, double addrspace(5)* %gep2
	store double %load, double addrspace(1)* %out			store double %load, double addrspace(1)* %out
	ret void			ret void
	}			}

	; ALL-LABEL: {{^}}private_elt_size_v2i64:			; ALL-LABEL: {{^}}private_elt_size_v2i64:
	; HSA-ELT16: private_element_size = 3			; HSA-ELT16: private_element_size = 3
	; HSA-ELT8: private_element_size = 2			; HSA-ELT8: private_element_size = 2
	; HSA-ELT4: private_element_size = 1			; HSA-ELT4: private_element_size = 1

	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16
	; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT16-DAG: buffer_store_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32
	; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT16-DAG: buffer_load_dwordx4 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}

	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:24			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:24
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:40			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:40
	; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:32			; HSA-ELT8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], 0 offset:32

	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen
	; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen			; HSA-ELT8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], 0 offen


	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:16{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:16{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:20{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:20{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:24{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:28{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:32{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:32{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:36{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:36{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:40{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:40{{$}}
	; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:44{{$}}			; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], 0 offset:44{{$}}

	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:4{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:8{{$}}
	; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}			; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen offset:12{{$}}
	define amdgpu_kernel void @private_elt_size_v2i64(<2 x i64> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {			define amdgpu_kernel void @private_elt_size_v2i64(<2 x i64> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%idxprom = sext i32 %tid to i64			%idxprom = sext i32 %tid to i64
	%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom			%gep.index = getelementptr inbounds i32, i32 addrspace(1)* %index.array, i64 %idxprom
	%index.load = load i32, i32 addrspace(1)* %gep.index			%index.load = load i32, i32 addrspace(1)* %gep.index
	%index = and i32 %index.load, 2			%index = and i32 %index.load, 2
	%alloca = alloca [2 x <2 x i64>], align 16, addrspace(5)			%alloca = alloca [2 x <2 x i64>], align 16, addrspace(5)
	Show All 14 Lines

llvm/test/CodeGen/AMDGPU/rename-independent-subregs-mac-operands.mir

	# RUN: llc -march=amdgcn -verify-machineinstrs -run-pass=simple-register-coalescing,rename-independent-subregs -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -verify-machineinstrs -run-pass=simple-register-coalescing,rename-independent-subregs -o - %s \| FileCheck -check-prefix=GCN %s
	---			---

	# GCN-LABEL: name: mac_invalid_operands			# GCN-LABEL: name: mac_invalid_operands
	# GCN: undef %18.sub0:vreg_128 = V_MAC_F32_e32 undef %3:vgpr_32, undef %9:vgpr_32, undef %18.sub0, implicit $exec			# GCN: undef %18.sub0:vreg_128 = V_MAC_F32_e32 undef %3:vgpr_32, undef %9:vgpr_32, undef %18.sub0, implicit $exec

	name: mac_invalid_operands			name: mac_invalid_operands
	alignment: 1			alignment: 1
	exposesReturnsTwice: false			exposesReturnsTwice: false
	legalized: false			legalized: false
	regBankSelected: false			regBankSelected: false
	selected: false			selected: false
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr4'
	frameOffsetReg: '$sgpr4'			frameOffsetReg: '$sgpr4'

	registers:			registers:
	- { id: 0, class: vreg_128 }			- { id: 0, class: vreg_128 }
	- { id: 1, class: vreg_128 }			- { id: 1, class: vreg_128 }
	- { id: 2, class: sgpr_64 }			- { id: 2, class: sgpr_64 }
	- { id: 3, class: vgpr_32 }			- { id: 3, class: vgpr_32 }
	- { id: 4, class: vgpr_32 }			- { id: 4, class: vgpr_32 }
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	alignment: 1			alignment: 1
	exposesReturnsTwice: false			exposesReturnsTwice: false
	legalized: false			legalized: false
	regBankSelected: false			regBankSelected: false
	selected: false			selected: false
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr4'
	frameOffsetReg: '$sgpr4'			frameOffsetReg: '$sgpr4'
	registers:			registers:
	- { id: 0, class: vgpr_32, preferred-register: '' }			- { id: 0, class: vgpr_32, preferred-register: '' }
	- { id: 1, class: vgpr_32, preferred-register: '' }			- { id: 1, class: vgpr_32, preferred-register: '' }
	- { id: 2, class: vgpr_32, preferred-register: '' }			- { id: 2, class: vgpr_32, preferred-register: '' }
	- { id: 3, class: vgpr_32, preferred-register: '' }			- { id: 3, class: vgpr_32, preferred-register: '' }
	- { id: 4, class: vgpr_32, preferred-register: '' }			- { id: 4, class: vgpr_32, preferred-register: '' }
	- { id: 5, class: sreg_64, preferred-register: '' }			- { id: 5, class: sreg_64, preferred-register: '' }
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sched-assert-dead-def-subreg-use-other-subreg.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs -run-pass=machine-scheduler -verify-misched -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs -run-pass=machine-scheduler -verify-misched -o - %s \| FileCheck %s

	# This would assert that a dead def should have no uses, but the dead			# This would assert that a dead def should have no uses, but the dead
	# def and use have different subreg indices.			# def and use have different subreg indices.

	---			---
	name: multi_def_dead_reg_subreg_check			name: multi_def_dead_reg_subreg_check
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr24_sgpr25_sgpr26_sgpr27'			scratchRSrcReg: '$sgpr24_sgpr25_sgpr26_sgpr27'
	scratchWaveOffsetReg: '$sgpr32'
	frameOffsetReg: '$sgpr32'			frameOffsetReg: '$sgpr32'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	argumentInfo:			argumentInfo:
	privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	privateSegmentWaveByteOffset: { reg: '$sgpr33' }			privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	body: \|			body: \|
	; CHECK-LABEL: name: multi_def_dead_reg_subreg_check			; CHECK-LABEL: name: multi_def_dead_reg_subreg_check
	; CHECK: bb.0:			; CHECK: bb.0:
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-subreg-def-across-subreg-def.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs -verify-misched -run-pass=machine-scheduler -o - %s \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs -verify-misched -run-pass=machine-scheduler -o - %s \| FileCheck %s

	---			---
	name: handleMoveUp_incorrect_interval			name: handleMoveUp_incorrect_interval
	tracksRegLiveness: true			tracksRegLiveness: true
	liveins:			liveins:
	- { reg: '$sgpr4_sgpr5', virtual-reg: '%0' }			- { reg: '$sgpr4_sgpr5', virtual-reg: '%0' }
	frameInfo:			frameInfo:
	maxAlignment: 1			maxAlignment: 1
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	scratchWaveOffsetReg: '$sgpr101'
	frameOffsetReg: '$sgpr101'			frameOffsetReg: '$sgpr101'
	stackPtrOffsetReg: '$sgpr101'			stackPtrOffsetReg: '$sgpr101'
	argumentInfo:			argumentInfo:
	privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	workGroupIDX: { reg: '$sgpr6' }			workGroupIDX: { reg: '$sgpr6' }
	privateSegmentWaveByteOffset: { reg: '$sgpr7' }			privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	workItemIDX: { reg: '$vgpr0' }			workItemIDX: { reg: '$vgpr0' }
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/scratch-buffer.ll

; RUN: llc -amdgpu-scalarize-global-loads=false -verify-machineinstrs -march=amdgcn < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s		; RUN: llc -amdgpu-scalarize-global-loads=false -verify-machineinstrs -march=amdgcn < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
; RUN: llc -amdgpu-scalarize-global-loads=false -verify-machineinstrs -march=amdgcn -mcpu=tonga < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s		; RUN: llc -amdgpu-scalarize-global-loads=false -verify-machineinstrs -march=amdgcn -mcpu=tonga < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

; When a frame index offset is more than 12-bits, make sure we don't store		; When a frame index offset is more than 12-bits, make sure we don't store
; it in mubuf's offset field.		; it in mubuf's offset field.

; Also, make sure we use the same register for storing the scratch buffer addresss		; Also, make sure we use the same register for storing the scratch buffer addresss
; for both stores. This register is allocated by the register scavenger, so we		; for both stores. This register is allocated by the register scavenger, so we
; should be able to reuse the same regiser for each scratch buffer access.		; should be able to reuse the same regiser for each scratch buffer access.

; GCN-LABEL: {{^}}legal_offset_fi:		; GCN-LABEL: {{^}}legal_offset_fi:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offset:4{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+}}:{{[0-9]+}}], 0 offset:4{{$}}
; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x8004		; GCN: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x8004
; GCN: buffer_store_dword v{{[0-9]+}}, [[OFFSET]], s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[OFFSET]], s[{{[0-9]+}}:{{[0-9]+}}], 0 offen{{$}}

define amdgpu_kernel void @legal_offset_fi(i32 addrspace(1)* %out, i32 %cond, i32 %if_offset, i32 %else_offset) {		define amdgpu_kernel void @legal_offset_fi(i32 addrspace(1)* %out, i32 %cond, i32 %if_offset, i32 %else_offset) {
entry:		entry:
%scratch0 = alloca [8192 x i32], addrspace(5)		%scratch0 = alloca [8192 x i32], addrspace(5)
%scratch1 = alloca [8192 x i32], addrspace(5)		%scratch1 = alloca [8192 x i32], addrspace(5)

%scratchptr0 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %scratch0, i32 0, i32 0		%scratchptr0 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %scratch0, i32 0, i32 0
store i32 1, i32 addrspace(5)* %scratchptr0		store i32 1, i32 addrspace(5)* %scratchptr0
Show All 19 Lines	done:
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void

ret void		ret void

}		}

; GCN-LABEL: {{^}}legal_offset_fi_offset:		; GCN-LABEL: {{^}}legal_offset_fi_offset:
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+}}:{{[0-9]+}}], 0 offen{{$}}
; This constant isn't folded, because it has multiple uses.		; This constant isn't folded, because it has multiple uses.
; GCN-DAG: v_mov_b32_e32 [[K8000:v[0-9]+]], 0x8004		; GCN-DAG: v_mov_b32_e32 [[K8000:v[0-9]+]], 0x8004
; GCN-DAG: v_add_{{[iu]}}32_e32 [[OFFSET:v[0-9]+]], vcc, [[K8000]]		; GCN-DAG: v_add_{{[iu]}}32_e32 [[OFFSET:v[0-9]+]], vcc, [[K8000]]
; GCN: buffer_store_dword v{{[0-9]+}}, [[OFFSET]], s[{{[0-9]+}}:{{[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[OFFSET]], s[{{[0-9]+}}:{{[0-9]+}}], 0 offen{{$}}

define amdgpu_kernel void @legal_offset_fi_offset(i32 addrspace(1)* %out, i32 %cond, i32 addrspace(1)* %offsets, i32 %if_offset, i32 %else_offset) {		define amdgpu_kernel void @legal_offset_fi_offset(i32 addrspace(1)* %out, i32 %cond, i32 addrspace(1)* %offsets, i32 %if_offset, i32 %else_offset) {
entry:		entry:
%scratch0 = alloca [8192 x i32], addrspace(5)		%scratch0 = alloca [8192 x i32], addrspace(5)
%scratch1 = alloca [8192 x i32], addrspace(5)		%scratch1 = alloca [8192 x i32], addrspace(5)

%offset0 = load i32, i32 addrspace(1)* %offsets		%offset0 = load i32, i32 addrspace(1)* %offsets
%scratchptr0 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %scratch0, i32 0, i32 %offset0		%scratchptr0 = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %scratch0, i32 0, i32 %offset0
Show All 20 Lines
done:		done:
%value = phi i32 [%if_value, %if], [%else_value, %else]		%value = phi i32 [%if_value, %if], [%else_value, %else]
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}neg_vaddr_offset_inbounds:		; GCN-LABEL: {{^}}neg_vaddr_offset_inbounds:
; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 16, v{{[0-9]+}}		; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 16, v{{[0-9]+}}
; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[{{[0-9]+:[0-9]+}}], 0 offen{{$}}
define amdgpu_kernel void @neg_vaddr_offset_inbounds(i32 %offset) {		define amdgpu_kernel void @neg_vaddr_offset_inbounds(i32 %offset) {
entry:		entry:
%array = alloca [8192 x i32], addrspace(5)		%array = alloca [8192 x i32], addrspace(5)
%ptr_offset = add i32 %offset, 4		%ptr_offset = add i32 %offset, 4
%ptr = getelementptr inbounds [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %ptr_offset		%ptr = getelementptr inbounds [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %ptr_offset
store i32 0, i32 addrspace(5)* %ptr		store i32 0, i32 addrspace(5)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}neg_vaddr_offset:		; GCN-LABEL: {{^}}neg_vaddr_offset:
; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 16, v{{[0-9]+}}		; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 16, v{{[0-9]+}}
; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[{{[0-9]+:[0-9]+}}], 0 offen{{$}}
define amdgpu_kernel void @neg_vaddr_offset(i32 %offset) {		define amdgpu_kernel void @neg_vaddr_offset(i32 %offset) {
entry:		entry:
%array = alloca [8192 x i32], addrspace(5)		%array = alloca [8192 x i32], addrspace(5)
%ptr_offset = add i32 %offset, 4		%ptr_offset = add i32 %offset, 4
%ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %ptr_offset		%ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %ptr_offset
store i32 0, i32 addrspace(5)* %ptr		store i32 0, i32 addrspace(5)* %ptr
ret void		ret void
}		}

; GCN-LABEL: {{^}}pos_vaddr_offset:		; GCN-LABEL: {{^}}pos_vaddr_offset:
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:20		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:20
define amdgpu_kernel void @pos_vaddr_offset(i32 addrspace(1)* %out, i32 %offset) {		define amdgpu_kernel void @pos_vaddr_offset(i32 addrspace(1)* %out, i32 %offset) {
entry:		entry:
%array = alloca [8192 x i32], addrspace(5)		%array = alloca [8192 x i32], addrspace(5)
%ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 4		%ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 4
store i32 0, i32 addrspace(5)* %ptr		store i32 0, i32 addrspace(5)* %ptr
%load_ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %offset		%load_ptr = getelementptr [8192 x i32], [8192 x i32] addrspace(5)* %array, i32 0, i32 %offset
%val = load i32, i32 addrspace(5)* %load_ptr		%val = load i32, i32 addrspace(5)* %load_ptr
store i32 %val, i32 addrspace(1)* %out		store i32 %val, i32 addrspace(1)* %out
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/scratch-simple.ll

	Show All 23 Lines
	; GFX10_W64-DAG: s_mov_b32 s7, 0x31e16000			; GFX10_W64-DAG: s_mov_b32 s7, 0x31e16000
	; GCN-DAG: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0			; GCN-DAG: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0
	; GCN-DAG: v_and_b32_e32 [[CLAMP_IDX:v[0-9]+]], 0x1fc, [[BYTES]]			; GCN-DAG: v_and_b32_e32 [[CLAMP_IDX:v[0-9]+]], 0x1fc, [[BYTES]]
	; GCN-NOT: s_mov_b32 s0			; GCN-NOT: s_mov_b32 s0

	; GCN-DAG: v_or_b32_e32 [[LO_OFF:v[0-9]+]], 0x200, [[CLAMP_IDX]]			; GCN-DAG: v_or_b32_e32 [[LO_OFF:v[0-9]+]], 0x200, [[CLAMP_IDX]]
	; GCN-DAG: v_or_b32_e32 [[HI_OFF:v[0-9]+]], 0x400, [[CLAMP_IDX]]			; GCN-DAG: v_or_b32_e32 [[HI_OFF:v[0-9]+]], 0x400, [[CLAMP_IDX]]

	; GCN: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_ps float @ps_main(i32 %idx) {			define amdgpu_ps float @ps_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}vs_main:			; GCN-LABEL: {{^}}vs_main:
	; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN-NOT: s_mov_b32 s0			; GCN-NOT: s_mov_b32 s0
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_vs float @vs_main(i32 %idx) {			define amdgpu_vs float @vs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}cs_main:			; GCN-LABEL: {{^}}cs_main:
	; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_cs float @cs_main(i32 %idx) {			define amdgpu_cs float @cs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}hs_main:			; GCN-LABEL: {{^}}hs_main:
	; SIVI: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; SIVI: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; SIVI-NOT: s_mov_b32 s0			; SIVI-NOT: s_mov_b32 s0
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; GFX9_10-NOT: s_mov_b32 s5			; GFX9_10-NOT: s_mov_b32 s5
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_hs float @hs_main(i32 %idx) {			define amdgpu_hs float @hs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}gs_main:			; GCN-LABEL: {{^}}gs_main:
	; SIVI: s_mov_b32 s4, SCRATCH_RSRC_DWORD0			; SIVI: s_mov_b32 s4, SCRATCH_RSRC_DWORD0
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	define amdgpu_gs float @gs_main(i32 %idx) {			define amdgpu_gs float @gs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

				; Mesa GS and HS shaders have the preloaded scratch wave offset SGPR fixed at
				; SGPR5, and the inreg implementation is used to reference it in the IR. The
				; following tests confirm the shader and anything inserted after the return
				; (i.e. SI_RETURN_TO_EPILOG) can access the scratch wave offset.

	; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:			; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:
				arsenmUnsubmitted Not Done Reply Inline Actions Can you add a comment elaborating on what this tests arsenm: Can you add a comment elaborating on what this tests
				scott.linderAuthorUnsubmitted Done Reply Inline Actions From discussion with @mareko my understanding is that Mesa GS and HS shaders have the preloaded scratch wave offset SGPR fixed at SGPR5, and the inreg implementation is used to reference it in the IR. So here, the shader snippet inserted after the SI_RETURN_TO_EPILOG wants to use the scratch wave offset, and the IR passes it along by padding out the inreg arguments until it gets to where the scratch wave offset is, and then using it in the return value. I'll add something to that effect in the test. scott.linder: From discussion with @mareko my understanding is that Mesa GS and HS shaders have the preloaded…
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0

	; SIVI-NOT: s_mov_b32 s6			; SIVI-NOT: s_mov_b32 s6
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10-NOT: s_mov_b32 s5			; GFX9_10-NOT: s_mov_b32 s5
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GCN-DAG: s_mov_b32 s2, s5			; GCN-DAG: s_mov_b32 s2, s5
	define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {			define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%f = fadd float %v1, %v2			%f = fadd float %v1, %v2
	%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2			%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
	%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3			%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
	ret <{i32, i32, i32, float}> %r2			ret <{i32, i32, i32, float}> %r2
	}			}

	; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:			; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0

	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GCN-DAG: s_mov_b32 s2, s5			; GCN-DAG: s_mov_b32 s2, s5
	define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {			define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%f = fadd float %v1, %v2			%f = fadd float %v1, %v2
	%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2			%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
	%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3			%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
	ret <{i32, i32, i32, float}> %r2			ret <{i32, i32, i32, float}> %r2
	}			}
				scott.linderAuthorUnsubmitted Done Reply Inline Actions @arsenm @nhaehnle Similar question as above wrt. how `inreg` should work. Is the `%swo` argument in these expected to actually be allowed to coincide with the scratch wave offset? scott.linder: @arsenm @nhaehnle Similar question as above wrt. how `inreg` should work. Is the `%swo`…

llvm/test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir

	Show All 29 Lines
	# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# SHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# SHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

	# SHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)			# SHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)
	# SHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# SHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# SHARE: SI_SPILL_S64_SAVE killed renamable $sgpr6_sgpr7, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)			# SHARE: SI_SPILL_S64_SAVE killed renamable $sgpr4_sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)
	# SHARE: renamable $sgpr6_sgpr7 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# SHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit undef $vgpr0			# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0
	# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)
	# SHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# SHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	# SHARE: renamable $sgpr6_sgpr7 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# SHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit $vgpr0			# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0
	# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)

	# NOSHARE: stack:			# NOSHARE: stack:
	# NOSHARE: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: default, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 1, name: '', type: spill-slot, offset: 0, size: 8, alignment: 4,			# NOSHARE: - { id: 1, name: '', type: spill-slot, offset: 0, size: 8, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 3, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 3, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

	# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)			# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)
	# NOSHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# NOSHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# NOSHARE: SI_SPILL_S64_SAVE killed renamable $sgpr6_sgpr7, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)			# NOSHARE: SI_SPILL_S64_SAVE killed renamable $sgpr4_sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)
	# NOSHARE: renamable $sgpr6_sgpr7 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# NOSHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit undef $vgpr0			# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0
	# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)
	# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.3, addrspace 5)			# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.3, addrspace 5)
	# NOSHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# NOSHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	# NOSHARE: renamable $sgpr6_sgpr7 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# NOSHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr6_sgpr7, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit $vgpr0			# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0
	# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.3, addrspace 5)			# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.3, addrspace 5)

	...			...

	name: sgpr_spill_wrong_stack_id			name: sgpr_spill_wrong_stack_id
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	hasCalls: true			hasCalls: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	frameOffsetReg: $sgpr32			frameOffsetReg: $sgpr32
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	%0:sreg_32_xm0 = COPY $sgpr32			%0:sreg_32_xm0 = COPY $sgpr32
	%1:vreg_64 = IMPLICIT_DEF			%1:vreg_64 = IMPLICIT_DEF
	%2:vgpr_32 = FLAT_LOAD_DWORD %1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr			%2:vgpr_32 = FLAT_LOAD_DWORD %1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr
	%3:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @func + 4, target-flags(amdgpu-rel32-hi) @func + 4, implicit-def dead $scc			%3:sreg_64 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @func + 4, target-flags(amdgpu-rel32-hi) @func + 4, implicit-def dead $scc
	ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32			ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32
	dead $sgpr30_sgpr31 = SI_CALL %3, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit undef $vgpr0			dead $sgpr30_sgpr31 = SI_CALL %3, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0
	$sgpr32 = COPY %0			$sgpr32 = COPY %0
	%4:sreg_32_xm0 = COPY $sgpr32			%4:sreg_32_xm0 = COPY $sgpr32
	ADJCALLSTACKDOWN 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32			ADJCALLSTACKDOWN 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32
	ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32			ADJCALLSTACKUP 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32
	$vgpr0 = COPY %2			$vgpr0 = COPY %2
	dead $sgpr30_sgpr31 = SI_CALL %3, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr4, implicit killed $vgpr0			dead $sgpr30_sgpr31 = SI_CALL %3, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit killed $vgpr0
	$sgpr32 = COPY %4			$sgpr32 = COPY %4
	ADJCALLSTACKDOWN 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32			ADJCALLSTACKDOWN 0, 0, implicit-def $scc, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr32

	...			...

llvm/test/CodeGen/AMDGPU/shl_add_ptr.ll

Show First 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	define void @shl_add_ptr_combine_2use_both_max_lds_offset(i32 %idx) #0 {
%ptr1 = inttoptr i32 %shl1 to i32 addrspace(3)*		%ptr1 = inttoptr i32 %shl1 to i32 addrspace(3)*
store volatile i32 9, i32 addrspace(3)* %ptr0		store volatile i32 9, i32 addrspace(3)* %ptr0
store volatile i32 10, i32 addrspace(3)* %ptr1		store volatile i32 10, i32 addrspace(3)* %ptr1
ret void		ret void
}		}

; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_private:		; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_private:
; GCN: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 2, v0		; GCN: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 2, v0
; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], s33 offen offset:16		; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], 0 offen offset:16

; GCN: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 3, v0		; GCN: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 3, v0
; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE1]], s[0:3], s33 offen offset:32		; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE1]], s[0:3], 0 offen offset:32
define void @shl_add_ptr_combine_2use_private(i16 zeroext %idx.arg) #0 {		define void @shl_add_ptr_combine_2use_private(i16 zeroext %idx.arg) #0 {
%idx = zext i16 %idx.arg to i32		%idx = zext i16 %idx.arg to i32
%idx.add = add nuw i32 %idx, 4		%idx.add = add nuw i32 %idx, 4
%shl0 = shl i32 %idx.add, 2		%shl0 = shl i32 %idx.add, 2
%shl1 = shl i32 %idx.add, 3		%shl1 = shl i32 %idx.add, 3
%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*		%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*
%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*		%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*
store volatile i32 9, i32 addrspace(5)* %ptr0		store volatile i32 9, i32 addrspace(5)* %ptr0
store volatile i32 10, i32 addrspace(5)* %ptr1		store volatile i32 10, i32 addrspace(5)* %ptr1
ret void		ret void
}		}

; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_max_private_offset:		; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_max_private_offset:
; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 3, v0		; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 3, v0
; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 4, v0		; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 4, v0
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], s33 offen offset:4088		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], 0 offen offset:4088
; GCN-DAG: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 0x1ff0, [[SCALE1]]		; GCN-DAG: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 0x1ff0, [[SCALE1]]
; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[0:3], s33 offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[ADD]], s[0:3], 0 offen{{$}}
define void @shl_add_ptr_combine_2use_max_private_offset(i16 zeroext %idx.arg) #0 {		define void @shl_add_ptr_combine_2use_max_private_offset(i16 zeroext %idx.arg) #0 {
%idx = zext i16 %idx.arg to i32		%idx = zext i16 %idx.arg to i32
%idx.add = add nuw i32 %idx, 511		%idx.add = add nuw i32 %idx, 511
%shl0 = shl i32 %idx.add, 3		%shl0 = shl i32 %idx.add, 3
%shl1 = shl i32 %idx.add, 4		%shl1 = shl i32 %idx.add, 4
%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*		%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*
%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*		%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*
store volatile i32 9, i32 addrspace(5)* %ptr0		store volatile i32 9, i32 addrspace(5)* %ptr0
store volatile i32 10, i32 addrspace(5)* %ptr1		store volatile i32 10, i32 addrspace(5)* %ptr1
ret void		ret void
}		}
; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_both_max_private_offset:		; GCN-LABEL: {{^}}shl_add_ptr_combine_2use_both_max_private_offset:
; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 0x100, v0		; GCN: v_add_{{[iu]}}32_e32 [[ADD:v[0-9]+]], vcc, 0x100, v0
; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 4, [[ADD]]		; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 4, [[ADD]]
; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 5, [[ADD]]		; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 5, [[ADD]]
; GCN-DAG: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], s33 offen{{$}}		; GCN-DAG: buffer_store_dword v{{[0-9]+}}, [[SCALE0]], s[0:3], 0 offen{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE1]], s[0:3], s33 offen{{$}}		; GCN: buffer_store_dword v{{[0-9]+}}, [[SCALE1]], s[0:3], 0 offen{{$}}
define void @shl_add_ptr_combine_2use_both_max_private_offset(i16 zeroext %idx.arg) #0 {		define void @shl_add_ptr_combine_2use_both_max_private_offset(i16 zeroext %idx.arg) #0 {
%idx = zext i16 %idx.arg to i32		%idx = zext i16 %idx.arg to i32
%idx.add = add nuw i32 %idx, 256		%idx.add = add nuw i32 %idx, 256
%shl0 = shl i32 %idx.add, 4		%shl0 = shl i32 %idx.add, 4
%shl1 = shl i32 %idx.add, 5		%shl1 = shl i32 %idx.add, 5
%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*		%ptr0 = inttoptr i32 %shl0 to i32 addrspace(5)*
%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*		%ptr1 = inttoptr i32 %shl1 to i32 addrspace(5)*
store volatile i32 9, i32 addrspace(5)* %ptr0		store volatile i32 9, i32 addrspace(5)* %ptr0
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-spill-sgpr-stack.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=ALL -check-prefix=SGPR %s

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; ALL-LABEL: {{^}}test:			; ALL-LABEL: {{^}}test:
	; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0			; ALL: s_mov_b32 s[[LO:[0-9]+]], SCRATCH_RSRC_DWORD0
	; ALL: s_mov_b32 s[[OFF:[0-9]+]], s3
	; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000			; ALL: s_mov_b32 s[[HI:[0-9]+]], 0xe80000

	; Make sure we are handling hazards correctly.			; Make sure we are handling hazards correctly.
	; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:16			; SGPR: buffer_load_dword [[VHI:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:16
	; SGPR-NEXT: s_waitcnt vmcnt(0)			; SGPR-NEXT: s_waitcnt vmcnt(0)
	; SGPR-NEXT: v_readfirstlane_b32 s[[HI:[0-9]+]], [[VHI]]			; SGPR-NEXT: v_readfirstlane_b32 s[[HI:[0-9]+]], [[VHI]]
	; SGPR-NEXT: s_nop 4			; SGPR-NEXT: s_nop 4
	; SGPR-NEXT: buffer_store_dword v0, off, s[0:[[HI]]{{\]}}, 0			; SGPR-NEXT: buffer_store_dword v0, off, s[0:[[HI]]{{\]}}, 0

	; ALL: s_endpgm			; ALL: s_endpgm
	define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {			define amdgpu_kernel void @test(i32 addrspace(1)* %out, i32 %in) {
	call void asm sideeffect "", "~{s[0:7]}" ()			call void asm sideeffect "", "~{s[0:7]}" ()
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines

	; Tail call disallowed with byval in parent, not callee. The stack			; Tail call disallowed with byval in parent, not callee. The stack
	; usage of incoming arguments must be <= the outgoing stack			; usage of incoming arguments must be <= the outgoing stack
	; arguments.			; arguments.

	; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_byval_i32:			; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_byval_i32:
	; GCN-NOT: v0			; GCN-NOT: v0
	; GCN-NOT: s32			; GCN-NOT: s32
	; GCN: buffer_load_dword v1, off, s[0:3], s33 offset:16			; GCN: buffer_load_dword v1, off, s[0:3], 0 offset:16
	; GCN: buffer_store_dword v1, off, s[0:3], s32{{$}}			; GCN: buffer_store_dword v1, off, s[0:3], s32{{$}}
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define fastcc i32 @sibling_call_i32_fastcc_i32_byval_i32(i32 %a, [32 x i32] %large) #1 {			define fastcc i32 @sibling_call_i32_fastcc_i32_byval_i32(i32 %a, [32 x i32] %large) #1 {
	entry:			entry:
	%ret = tail call fastcc i32 @i32_fastcc_i32_byval_i32(i32 %a, i32 addrspace(5)* inttoptr (i32 16 to i32 addrspace(5)*))			%ret = tail call fastcc i32 @i32_fastcc_i32_byval_i32(i32 %a, i32 addrspace(5)* inttoptr (i32 16 to i32 addrspace(5)*))
	ret i32 %ret			ret i32 %ret
	}			}

	▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sp-too-many-input-sgprs.ll

This file was deleted.

	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -verify-machineinstrs < %s \| FileCheck -check-prefixes=MESA3D,ALL %s
	; RUN: llc -mtriple=amdgcn-- -verify-machineinstrs < %s \| FileCheck -check-prefixes=UNKNOWN,ALL %s

	; Make sure shaders pick a workable SP with > 32 input SGPRs.
	; FIXME: Doesn't seem to be getting initial value from right register?

	; ALL-LABEL: {{^}}too_many_input_sgprs_32:
	; MESA3D-NOT: s34
	; MESA3D: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s34 offset:4

	; Happens to end up in s32 anyway
	; UNKNOWN-NOT: s32
	; UNKNOWN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s32 offset:4
	define amdgpu_ps i32 @too_many_input_sgprs_32(i32 inreg %arg, i32 inreg %arg1, i32 inreg %arg2, i32 inreg %arg3, i32 inreg %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 inreg %arg7,
	i32 inreg %arg8, i32 inreg %arg9, i32 inreg %arg10, i32 inreg %arg11, i32 inreg %arg12, i32 inreg %arg13, i32 inreg %arg14, i32 inreg %arg15,
	i32 inreg %arg16, i32 inreg %arg17, i32 inreg %arg18, i32 inreg %arg19, i32 inreg %arg20, i32 inreg %arg21, i32 inreg %arg22, i32 inreg %arg23,
	i32 inreg %arg24, i32 inreg %arg25, i32 inreg %arg26, i32 inreg %arg27, i32 inreg %arg28, i32 inreg %arg29, i32 inreg %arg30, i32 inreg %arg31) {
	bb:
	%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca
	%tmp = add i32 %arg, %arg1
	%tmp32 = add i32 %tmp, %arg2
	%tmp33 = add i32 %tmp32, %arg3
	%tmp34 = add i32 %tmp33, %arg4
	%tmp35 = add i32 %tmp34, %arg5
	%tmp36 = add i32 %tmp35, %arg6
	%tmp37 = add i32 %tmp36, %arg7
	%tmp38 = add i32 %tmp37, %arg8
	%tmp39 = add i32 %tmp38, %arg9
	%tmp40 = add i32 %tmp39, %arg10
	%tmp41 = add i32 %tmp40, %arg11
	%tmp42 = add i32 %tmp41, %arg12
	%tmp43 = add i32 %tmp42, %arg13
	%tmp44 = add i32 %tmp43, %arg14
	%tmp45 = add i32 %tmp44, %arg15
	%tmp46 = add i32 %tmp45, %arg16
	%tmp47 = add i32 %tmp46, %arg17
	%tmp48 = add i32 %tmp47, %arg18
	%tmp49 = add i32 %tmp48, %arg19
	%tmp50 = add i32 %tmp49, %arg20
	%tmp51 = add i32 %tmp50, %arg21
	%tmp52 = add i32 %tmp51, %arg22
	%tmp53 = add i32 %tmp52, %arg23
	%tmp54 = add i32 %tmp53, %arg24
	%tmp55 = add i32 %tmp54, %arg25
	%tmp56 = add i32 %tmp55, %arg26
	%tmp57 = add i32 %tmp56, %arg27
	%tmp58 = add i32 %tmp57, %arg28
	%tmp59 = add i32 %tmp58, %arg29
	%tmp60 = add i32 %tmp59, %arg30
	%tmp61 = add i32 %tmp60, %arg31
	ret i32 %tmp61
	}

	; ALL-LABEL: {{^}}too_many_input_sgprs_33:
	; MESA3D-NOT: s35
	; MESA3D: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s35 offset:4

	; UNKNOWN-NOT: s33
	; UNKNOWN: buffer_store_dword v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s33 offset:4
	define amdgpu_ps i32 @too_many_input_sgprs_33(i32 inreg %arg, i32 inreg %arg1, i32 inreg %arg2, i32 inreg %arg3, i32 inreg %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 inreg %arg7,
	i32 inreg %arg8, i32 inreg %arg9, i32 inreg %arg10, i32 inreg %arg11, i32 inreg %arg12, i32 inreg %arg13, i32 inreg %arg14, i32 inreg %arg15,
	i32 inreg %arg16, i32 inreg %arg17, i32 inreg %arg18, i32 inreg %arg19, i32 inreg %arg20, i32 inreg %arg21, i32 inreg %arg22, i32 inreg %arg23,
	i32 inreg %arg24, i32 inreg %arg25, i32 inreg %arg26, i32 inreg %arg27, i32 inreg %arg28, i32 inreg %arg29, i32 inreg %arg30, i32 inreg %arg31,
	i32 inreg %arg32) {
	bb:
	%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca
	%tmp = add i32 %arg, %arg1
	%tmp32 = add i32 %tmp, %arg2
	%tmp33 = add i32 %tmp32, %arg3
	%tmp34 = add i32 %tmp33, %arg4
	%tmp35 = add i32 %tmp34, %arg5
	%tmp36 = add i32 %tmp35, %arg6
	%tmp37 = add i32 %tmp36, %arg7
	%tmp38 = add i32 %tmp37, %arg8
	%tmp39 = add i32 %tmp38, %arg9
	%tmp40 = add i32 %tmp39, %arg10
	%tmp41 = add i32 %tmp40, %arg11
	%tmp42 = add i32 %tmp41, %arg12
	%tmp43 = add i32 %tmp42, %arg13
	%tmp44 = add i32 %tmp43, %arg14
	%tmp45 = add i32 %tmp44, %arg15
	%tmp46 = add i32 %tmp45, %arg16
	%tmp47 = add i32 %tmp46, %arg17
	%tmp48 = add i32 %tmp47, %arg18
	%tmp49 = add i32 %tmp48, %arg19
	%tmp50 = add i32 %tmp49, %arg20
	%tmp51 = add i32 %tmp50, %arg21
	%tmp52 = add i32 %tmp51, %arg22
	%tmp53 = add i32 %tmp52, %arg23
	%tmp54 = add i32 %tmp53, %arg24
	%tmp55 = add i32 %tmp54, %arg25
	%tmp56 = add i32 %tmp55, %arg26
	%tmp57 = add i32 %tmp56, %arg27
	%tmp58 = add i32 %tmp57, %arg28
	%tmp59 = add i32 %tmp58, %arg29
	%tmp60 = add i32 %tmp59, %arg30
	%tmp61 = add i32 %tmp60, %arg31
	%tmp62 = add i32 %tmp61, %arg32
	ret i32 %tmp62
	}

llvm/test/CodeGen/AMDGPU/spill-agpr.ll

; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX908,A2V %s		; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX908,A2V %s
; RUN: llc -march=amdgcn -mcpu=gfx908 -amdgpu-spill-vgpr-to-agpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX908,A2M %s		; RUN: llc -march=amdgcn -mcpu=gfx908 -amdgpu-spill-vgpr-to-agpr=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX908,A2M %s

; GCN-LABEL: {{^}}max_24regs_32a_used:		; GCN-LABEL: {{^}}max_24regs_32a_used:
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; A2V-NOT: SCRATCH_RSRC		; A2V-NOT: SCRATCH_RSRC
; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0		; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0
; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI:[0-9]+]] ; 4-byte Folded Spill		; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI]] ; 4-byte Folded Reload		; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload
; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]		; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
; A2V: ScratchSize: 0		; A2V: ScratchSize: 0
define amdgpu_kernel void @max_24regs_32a_used(<16 x float> addrspace(1)* %arg, float addrspace(1)* %out) #0 {		define amdgpu_kernel void @max_24regs_32a_used(<16 x float> addrspace(1)* %arg, float addrspace(1)* %out) #0 {
bb:		bb:
%in.1 = load <16 x float>, <16 x float> addrspace(1)* %arg		%in.1 = load <16 x float>, <16 x float> addrspace(1)* %arg
%mai.1 = tail call <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float 1.0, float 1.0, <16 x float> %in.1, i32 0, i32 0, i32 0)		%mai.1 = tail call <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float 1.0, float 1.0, <16 x float> %in.1, i32 0, i32 0, i32 0)
%mai.2 = tail call <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float 1.0, float 1.0, <16 x float> %mai.1, i32 0, i32 0, i32 0)		%mai.2 = tail call <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float 1.0, float 1.0, <16 x float> %mai.1, i32 0, i32 0, i32 0)
%elt1 = extractelement <16 x float> %mai.2, i32 0		%elt1 = extractelement <16 x float> %mai.2, i32 0
Show All 11 Lines	bb:
ret void		ret void
}		}

; GCN-LABEL: {{^}}max_12regs_13a_used:		; GCN-LABEL: {{^}}max_12regs_13a_used:
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; A2V-NOT: SCRATCH_RSRC		; A2V-NOT: SCRATCH_RSRC
; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a4		; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a4
; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI:[0-9]+]] ; 4-byte Folded Spill		; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI]] ; 4-byte Folded Reload		; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload
; A2V: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]		; A2V: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
; A2V: ScratchSize: 0		; A2V: ScratchSize: 0
define amdgpu_kernel void @max_12regs_13a_used(<4 x float> addrspace(1)* %arg, <4 x float> addrspace(1)* %out) #2 {		define amdgpu_kernel void @max_12regs_13a_used(<4 x float> addrspace(1)* %arg, <4 x float> addrspace(1)* %out) #2 {
bb:		bb:
%in.1 = load <4 x float>, <4 x float> addrspace(1)* %arg		%in.1 = load <4 x float>, <4 x float> addrspace(1)* %arg
%mai.1 = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %in.1, i32 0, i32 0, i32 0)		%mai.1 = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %in.1, i32 0, i32 0, i32 0)
%mai.2 = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %mai.1, i32 0, i32 0, i32 0)		%mai.2 = tail call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float 1.0, float 1.0, <4 x float> %mai.1, i32 0, i32 0, i32 0)
br label %use		br label %use
Show All 11 Lines	st:
ret void		ret void
}		}

; GCN-LABEL: {{^}}max_10_vgprs_used_9a:		; GCN-LABEL: {{^}}max_10_vgprs_used_9a:
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; A2V-NOT: SCRATCH_RSRC		; A2V-NOT: SCRATCH_RSRC
; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0		; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0
; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI:[0-9]+]] ; 4-byte Folded Spill		; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI]] ; 4-byte Folded Reload		; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload
; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]		; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
; A2V: ScratchSize: 0		; A2V: ScratchSize: 0
define amdgpu_kernel void @max_10_vgprs_used_9a(i32 addrspace(1)* %p) #1 {		define amdgpu_kernel void @max_10_vgprs_used_9a(i32 addrspace(1)* %p) #1 {
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
call void asm sideeffect "", "a,a,a,a"(i32 1, i32 2, i32 3, i32 4)		call void asm sideeffect "", "a,a,a,a"(i32 1, i32 2, i32 3, i32 4)
call void asm sideeffect "", "a,a,a,a,a"(i32 5, i32 6, i32 7, i32 8, i32 9)		call void asm sideeffect "", "a,a,a,a,a"(i32 5, i32 6, i32 7, i32 8, i32 9)
ret void		ret void
}		}

; GCN-LABEL: {{^}}max_32regs_mfma32:		; GCN-LABEL: {{^}}max_32regs_mfma32:
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD0
; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1		; A2M-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
; A2V-NOT: SCRATCH_RSRC		; A2V-NOT: SCRATCH_RSRC
; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0		; GFX908-DAG: v_accvgpr_read_b32 v[[VSPILL:[0-9]+]], a0
; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI:[0-9]+]] ; 4-byte Folded Spill		; A2M: buffer_store_dword v[[VSPILL]], off, s[{{[0-9:]+}}], 0 offset:[[FI:[0-9]+]] ; 4-byte Folded Spill
; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], s{{[0-9]+}} offset:[[FI]] ; 4-byte Folded Reload		; A2M: buffer_load_dword v[[VSPILL:[0-9]+]], off, s[{{[0-9:]+}}], 0 offset:[[FI]] ; 4-byte Folded Reload
; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]		; GFX908: v_accvgpr_write_b32 a{{[0-9]+}}, v[[VSPILL]]
; A2V: ScratchSize: 0		; A2V: ScratchSize: 0
define amdgpu_kernel void @max_32regs_mfma32(float addrspace(1)* %arg) #3 {		define amdgpu_kernel void @max_32regs_mfma32(float addrspace(1)* %arg) #3 {
bb:		bb:
%v = call i32 asm sideeffect "", "=a"()		%v = call i32 asm sideeffect "", "=a"()
br label %use		br label %use

use:		use:
Show All 16 Lines

llvm/test/CodeGen/AMDGPU/spill-before-exec.mir

# REQUIRES: asserts		# REQUIRES: asserts
# RUN: llc -mtriple=amdgcn--- -verify-machineinstrs -debug-only=regalloc -run-pass=greedy -o /dev/null %s 2>&1 \| FileCheck %s		# RUN: llc -mtriple=amdgcn--- -verify-machineinstrs -debug-only=regalloc -run-pass=greedy -o /dev/null %s 2>&1 \| FileCheck %s

---		---
# Check that physreg candidate is not used since cannot be spilled in a block,		# Check that physreg candidate is not used since cannot be spilled in a block,
# e.g. before exec mask preamble		# e.g. before exec mask preamble
# CHECK: , cannot spill all interferences.		# CHECK: , cannot spill all interferences.

name: foo		name: foo
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3		scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32
registers:		registers:
- { id: 0, class: sreg_64 }		- { id: 0, class: sreg_64 }
- { id: 1100, class: sgpr_128 }		- { id: 1100, class: sgpr_128 }
- { id: 1101, class: sgpr_128 }		- { id: 1101, class: sgpr_128 }
- { id: 1102, class: sgpr_128 }		- { id: 1102, class: sgpr_128 }
- { id: 1103, class: sgpr_128 }		- { id: 1103, class: sgpr_128 }
- { id: 1104, class: sgpr_128 }		- { id: 1104, class: sgpr_128 }
Show All 9 Lines	registers:
- { id: 1114, class: sgpr_128 }		- { id: 1114, class: sgpr_128 }
- { id: 1115, class: sgpr_128 }		- { id: 1115, class: sgpr_128 }
- { id: 1116, class: sgpr_128 }		- { id: 1116, class: sgpr_128 }
- { id: 1117, class: sgpr_128 }		- { id: 1117, class: sgpr_128 }
- { id: 1118, class: sgpr_128 }		- { id: 1118, class: sgpr_128 }
- { id: 1119, class: sgpr_128 }		- { id: 1119, class: sgpr_128 }
- { id: 1120, class: sgpr_128 }		- { id: 1120, class: sgpr_128 }
- { id: 1121, class: sgpr_128 }		- { id: 1121, class: sgpr_128 }
		- { id: 1122, class: sgpr_128 }
		- { id: 1123, class: sgpr_128 }
		- { id: 1124, class: sgpr_128 }
		- { id: 1125, class: sgpr_128 }
body: \|		body: \|
bb.0:		bb.0:
successors: %bb.1		successors: %bb.1
liveins: $sgpr96_sgpr97, $sgpr98_sgpr99, $sgpr100_sgpr101, $sgpr102_sgpr103		liveins: $sgpr96_sgpr97, $sgpr98_sgpr99, $sgpr100_sgpr101, $sgpr102_sgpr103
%0:sreg_64 = COPY $sgpr102_sgpr103		%0:sreg_64 = COPY $sgpr102_sgpr103
%1100 = COPY $sgpr100_sgpr101_sgpr102_sgpr103		%1100 = COPY $sgpr100_sgpr101_sgpr102_sgpr103
%1101 = COPY %1100		%1101 = COPY %1100
%1102 = COPY %1100		%1102 = COPY %1100
Show All 11 Lines	bb.0:
%1114 = COPY %1100		%1114 = COPY %1100
%1115 = COPY %1100		%1115 = COPY %1100
%1116 = COPY %1100		%1116 = COPY %1100
%1117 = COPY %1100		%1117 = COPY %1100
%1118 = COPY %1100		%1118 = COPY %1100
%1119 = COPY %1100		%1119 = COPY %1100
%1120 = COPY %1100		%1120 = COPY %1100
%1121 = COPY %1100		%1121 = COPY %1100
		%1122 = COPY %1100
		%1123 = COPY %1100
		%1124 = COPY %1100
		%1125 = COPY %1100
S_BRANCH %bb.1		S_BRANCH %bb.1

bb.1:		bb.1:
liveins: $sgpr96_sgpr97, $sgpr98_sgpr99, $sgpr102_sgpr103		liveins: $sgpr96_sgpr97, $sgpr98_sgpr99, $sgpr102_sgpr103
%0 = S_OR_SAVEEXEC_B64 $sgpr96_sgpr97, implicit-def $exec, implicit-def $scc, implicit $exec		%0 = S_OR_SAVEEXEC_B64 $sgpr96_sgpr97, implicit-def $exec, implicit-def $scc, implicit $exec
$exec = S_XOR_B64_term $exec, %0, implicit-def $scc		$exec = S_XOR_B64_term $exec, %0, implicit-def $scc
SI_MASK_BRANCH %bb.100, implicit $exec		SI_MASK_BRANCH %bb.100, implicit $exec
S_BRANCH %bb.2		S_BRANCH %bb.2
Show All 18 Lines	bb.200:
S_CMP_EQ_U64 %1106.sub0_sub1, %1107.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1106.sub0_sub1, %1107.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1108.sub0_sub1, %1109.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1108.sub0_sub1, %1109.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1110.sub0_sub1, %1111.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1110.sub0_sub1, %1111.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1112.sub0_sub1, %1113.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1112.sub0_sub1, %1113.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1114.sub0_sub1, %1115.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1114.sub0_sub1, %1115.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1116.sub0_sub1, %1117.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1116.sub0_sub1, %1117.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1118.sub0_sub1, %1119.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1118.sub0_sub1, %1119.sub2_sub3, implicit-def $scc
S_CMP_EQ_U64 %1120.sub0_sub1, %1121.sub2_sub3, implicit-def $scc		S_CMP_EQ_U64 %1120.sub0_sub1, %1121.sub2_sub3, implicit-def $scc
		S_CMP_EQ_U64 %1122.sub0_sub1, %1123.sub2_sub3, implicit-def $scc
		S_CMP_EQ_U64 %1124.sub0_sub1, %1125.sub2_sub3, implicit-def $scc

$vgpr0 = V_MOV_B32_e32 0, implicit $exec		$vgpr0 = V_MOV_B32_e32 0, implicit $exec
S_SETPC_B64_return undef $sgpr30_sgpr31, implicit %0, implicit $vgpr0		S_SETPC_B64_return undef $sgpr30_sgpr31, implicit %0, implicit $vgpr0

...		...

llvm/test/CodeGen/AMDGPU/spill-empty-live-interval.mir

	Show All 15 Lines
	# CHECK-NEXT: %8:vreg_64 = SI_SPILL_V64_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 8 from %stack.0, align 4, addrspace 5)			# CHECK-NEXT: %8:vreg_64 = SI_SPILL_V64_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 8 from %stack.0, align 4, addrspace 5)
	# CHECK-NEXT: S_NOP 0, implicit %8.sub1			# CHECK-NEXT: S_NOP 0, implicit %8.sub1
	# CHECK-NEXT: S_NOP 0, implicit undef %9.sub0			# CHECK-NEXT: S_NOP 0, implicit undef %9.sub0

	name: expecting_non_empty_interval			name: expecting_non_empty_interval
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1

	undef %0.sub1:vreg_64 = V_MAC_F32_e32 0, undef %1:vgpr_32, undef %0.sub1, implicit $exec			undef %0.sub1:vreg_64 = V_MAC_F32_e32 0, undef %1:vgpr_32, undef %0.sub1, implicit $exec
	undef %2.sub1:vreg_64 = V_MOV_B32_e32 1786773504, implicit $exec			undef %2.sub1:vreg_64 = V_MOV_B32_e32 1786773504, implicit $exec
	dead %3:vgpr_32 = V_MUL_F32_e32 0, %2.sub1, implicit $exec			dead %3:vgpr_32 = V_MUL_F32_e32 0, %2.sub1, implicit $exec
	Show All 17 Lines
	# CHECK-NEXT: S_NOP 0, implicit %1.sub2			# CHECK-NEXT: S_NOP 0, implicit %1.sub2
	# CHECK-NEXT: S_NOP 0, implicit undef %4.sub0			# CHECK-NEXT: S_NOP 0, implicit undef %4.sub0
	# CHECK-NEXT: undef %2.sub2:vreg_128 = V_MOV_B32_e32 0, implicit $exec			# CHECK-NEXT: undef %2.sub2:vreg_128 = V_MOV_B32_e32 0, implicit $exec
	# CHECK-NEXT: S_NOP 0, implicit %2.sub2			# CHECK-NEXT: S_NOP 0, implicit %2.sub2
	name: rematerialize_empty_interval_has_reference			name: rematerialize_empty_interval_has_reference
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1			successors: %bb.1

	undef %0.sub2:vreg_128 = V_MOV_B32_e32 0, implicit $exec			undef %0.sub2:vreg_128 = V_MOV_B32_e32 0, implicit $exec
	undef %1.sub2:vreg_128 = V_MOV_B32_e32 1786773504, implicit $exec			undef %1.sub2:vreg_128 = V_MOV_B32_e32 1786773504, implicit $exec

	bb.1:			bb.1:
	S_NOP 0, implicit %1.sub2			S_NOP 0, implicit %1.sub2
	S_NOP 0, implicit undef %0.sub0			S_NOP 0, implicit undef %0.sub0
	S_NOP 0, implicit %0.sub2			S_NOP 0, implicit %0.sub2

	...			...

llvm/test/CodeGen/AMDGPU/spill-m0.ll

	; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s			; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s
	; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s			; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=1 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVGPR -check-prefix=GCN %s
	; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s			; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s
	; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s			; RUN: llc -O0 -amdgpu-spill-sgpr-to-vgpr=0 -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=TOVMEM -check-prefix=GCN %s

	; XXX - Why does it like to use vcc?			; XXX - Why does it like to use vcc?

	; GCN-LABEL: {{^}}spill_m0:			; GCN-LABEL: {{^}}spill_m0:

	; GCN-DAG: s_cmp_lg_u32			; GCN-DAG: s_cmp_lg_u32

	; TOVGPR-DAG: s_mov_b32 [[M0_COPY:s[0-9]+]], m0			; TOVGPR-DAG: s_mov_b32 [[M0_COPY:s[0-9]+]], m0
	; TOVGPR: v_writelane_b32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]], 2			; TOVGPR: v_writelane_b32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]], 2

	; TOVMEM-DAG: s_mov_b32 [[M0_COPY:s[0-9]+]], m0			; TOVMEM-DAG: s_mov_b32 [[M0_COPY:s[0-9]+]], m0
	; TOVMEM-DAG: v_mov_b32_e32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]]			; TOVMEM-DAG: v_mov_b32_e32 [[SPILL_VREG:v[0-9]+]], [[M0_COPY]]
	; TOVMEM: buffer_store_dword [[SPILL_VREG]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12 ; 4-byte Folded Spill			; TOVMEM: buffer_store_dword [[SPILL_VREG]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12 ; 4-byte Folded Spill

	; GCN: s_cbranch_scc1 [[ENDIF:BB[0-9]+_[0-9]+]]			; GCN: s_cbranch_scc1 [[ENDIF:BB[0-9]+_[0-9]+]]

	; GCN: [[ENDIF]]:			; GCN: [[ENDIF]]:
	; TOVGPR: v_readlane_b32 [[M0_RESTORE:s[0-9]+]], [[SPILL_VREG]], 2			; TOVGPR: v_readlane_b32 [[M0_RESTORE:s[0-9]+]], [[SPILL_VREG]], 2
	; TOVGPR: s_mov_b32 m0, [[M0_RESTORE]]			; TOVGPR: s_mov_b32 m0, [[M0_RESTORE]]

	; TOVMEM: buffer_load_dword [[RELOAD_VREG:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:12 ; 4-byte Folded Reload			; TOVMEM: buffer_load_dword [[RELOAD_VREG:v[0-9]+]], off, s{{\[[0-9]+:[0-9]+\]}}, 0 offset:12 ; 4-byte Folded Reload
	; TOVMEM: s_waitcnt vmcnt(0)			; TOVMEM: s_waitcnt vmcnt(0)
	; TOVMEM: v_readfirstlane_b32 [[M0_RESTORE:s[0-9]+]], [[RELOAD_VREG]]			; TOVMEM: v_readfirstlane_b32 [[M0_RESTORE:s[0-9]+]], [[RELOAD_VREG]]
	; TOVMEM: s_mov_b32 m0, [[M0_RESTORE]]			; TOVMEM: s_mov_b32 m0, [[M0_RESTORE]]

	; GCN: s_add_i32 s{{[0-9]+}}, m0, 1			; GCN: s_add_i32 s{{[0-9]+}}, m0, 1
	define amdgpu_kernel void @spill_m0(i32 %cond, i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @spill_m0(i32 %cond, i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%m0 = call i32 asm sideeffect "s_mov_b32 m0, 0", "={m0}"() #0			%m0 = call i32 asm sideeffect "s_mov_b32 m0, 0", "={m0}"() #0
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -stress-regalloc=8 < %s \| FileCheck %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -stress-regalloc=8 < %s \| FileCheck %s

; Test that the VGPR spiller correctly switches to SGPR offsets when the		; Test that the VGPR spiller correctly switches to SGPR offsets when the
; instruction offset field would overflow, and that it accounts for memory		; instruction offset field would overflow, and that it accounts for memory
; swizzling.		; swizzling.

; CHECK-LABEL: test_inst_offset_kernel		; CHECK-LABEL: test_inst_offset_kernel
define amdgpu_kernel void @test_inst_offset_kernel() {		define amdgpu_kernel void @test_inst_offset_kernel() {
entry:		entry:
; Occupy 4092 bytes of scratch, so the offset of the spill of %a just fits in		; Occupy 4092 bytes of scratch, so the offset of the spill of %a just fits in
; the instruction offset field.		; the instruction offset field.
%alloca = alloca i8, i32 4088, align 4, addrspace(5)		%alloca = alloca i8, i32 4088, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4092 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4092 ; 4-byte Folded Spill
%a = load volatile i32, i32 addrspace(5)* %aptr		%a = load volatile i32, i32 addrspace(5)* %aptr

; Force %a to spill.		; Force %a to spill.
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
store volatile i32 %a, i32 addrspace(5)* %outptr		store volatile i32 %a, i32 addrspace(5)* %outptr

ret void		ret void
}		}

; CHECK-LABEL: test_sgpr_offset_kernel		; CHECK-LABEL: test_sgpr_offset_kernel
define amdgpu_kernel void @test_sgpr_offset_kernel() {		define amdgpu_kernel void @test_sgpr_offset_kernel() {
entry:		entry:
; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not		; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
; fit in the instruction, and has to live in the SGPR offset.		; fit in the instruction, and has to live in the SGPR offset.
%alloca = alloca i8, i32 4092, align 4, addrspace(5)		%alloca = alloca i8, i32 4092, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
; 0x40000 / 64 = 4096 (for wave64)		; 0x40000 / 64 = 4096 (for wave64)
; CHECK: s_add_u32 s6, s7, 0x40000		; CHECK: s_mov_b32 s6, 0x40000
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill
%a = load volatile i32, i32 addrspace(5)* %aptr		%a = load volatile i32, i32 addrspace(5)* %aptr

; Force %a to spill		; Force %a to spill
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
store volatile i32 %a, i32 addrspace(5)* %outptr		store volatile i32 %a, i32 addrspace(5)* %outptr

ret void		ret void
}		}

; CHECK-LABEL: test_sgpr_offset_kernel_scavenge_fail		; FIXME: If we fail to scavenge an SGPR in a kernel we don't have a stack
define amdgpu_kernel void @test_sgpr_offset_kernel_scavenge_fail() #1 {		; pointer to temporarily update, so we just crash.
		scott.linderAuthorUnsubmitted Done Reply Inline Actions Is it OK for us to fail here? This is a consequence of not having a frame pointer in entry functions and not being able to e.g. restart RA after we realize we really need it in this case. scott.linder: Is it OK for us to fail here? This is a consequence of not having a frame pointer in entry…
entry:
; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
; fit in the instruction, and has to live in the SGPR offset.
%alloca = alloca i8, i32 4092, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1

; 0x40000 / 64 = 4096 (for wave64)
%a = load volatile i32, i32 addrspace(5)* %aptr

%asm = call { i32, i32, i32, i32, i32, i32, i32, i32 } asm sideeffect "", "=s,=s,=s,=s,=s,=s,=s,=s"()
%asm0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 0
%asm1 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 1
%asm2 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 2
%asm3 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 3
%asm4 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 4
%asm5 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 5
%asm6 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 6
%asm7 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 7

call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0

; CHECK: s_add_u32 s7, s7, 0x40000
; CHECK: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s7 ; 4-byte Folded Reload
; CHECK: s_sub_u32 s7, s7, 0x40000

; Force %a to spill with no free SGPRs
call void asm sideeffect "", "s,s,s,s,s,s,s,s,v"(i32 %asm0, i32 %asm1, i32 %asm2, i32 %asm3, i32 %asm4, i32 %asm5, i32 %asm6, i32 %asm7, i32 %a)
ret void
}

; CHECK-LABEL: test_sgpr_offset_function_scavenge_fail		; CHECK-LABEL: test_sgpr_offset_function_scavenge_fail
define void @test_sgpr_offset_function_scavenge_fail() #2 {		define void @test_sgpr_offset_function_scavenge_fail() #2 {
entry:		entry:
; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not		; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
; fit in the instruction, and has to live in the SGPR offset.		; fit in the instruction, and has to live in the SGPR offset.
%alloca = alloca i8, i32 4096, align 4, addrspace(5)		%alloca = alloca i8, i32 4096, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
entry:		entry:
; Occupy 4088 bytes of scratch, so that the spill of the last subreg of %a		; Occupy 4088 bytes of scratch, so that the spill of the last subreg of %a
; still fits below offset 4096 (4088 + 8 - 4 = 4092), and can be placed in		; still fits below offset 4096 (4088 + 8 - 4 = 4092), and can be placed in
; the instruction offset field.		; the instruction offset field.
%alloca = alloca i8, i32 4084, align 4, addrspace(5)		%alloca = alloca i8, i32 4084, align 4, addrspace(5)
%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*
%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*		%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*

; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4088 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4088 ; 4-byte Folded Spill
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s{{[0-9]+}} offset:4092 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4092 ; 4-byte Folded Spill
%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1		%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1
%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr		%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr

; Force %a to spill.		; Force %a to spill.
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

; Ensure the alloca sticks around.		; Ensure the alloca sticks around.
%bptr = getelementptr i32, i32 addrspace(5)* %bufv1, i32 1		%bptr = getelementptr i32, i32 addrspace(5)* %bufv1, i32 1
Show All 11 Lines	entry:
; Occupy 4092 bytes of scratch, so that the spill of the last subreg of %a		; Occupy 4092 bytes of scratch, so that the spill of the last subreg of %a
; does not fit below offset 4096 (4092 + 8 - 4 = 4096), and has to live		; does not fit below offset 4096 (4092 + 8 - 4 = 4096), and has to live
; in the SGPR offset.		; in the SGPR offset.
%alloca = alloca i8, i32 4088, align 4, addrspace(5)		%alloca = alloca i8, i32 4088, align 4, addrspace(5)
%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*
%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*		%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*

; 0x3ff00 / 64 = 4092 (for wave64)		; 0x3ff00 / 64 = 4092 (for wave64)
; CHECK: s_add_u32 s6, s7, 0x3ff00		; CHECK: s_mov_b32 s6, 0x3ff00
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 ; 4-byte Folded Spill
; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 offset:4 ; 4-byte Folded Spill		; CHECK: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s6 offset:4 ; 4-byte Folded Spill
%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1		%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1
%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr		%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr

; Force %a to spill.		; Force %a to spill.
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefix=GCN %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefix=GCN %s

				; FIXME: The MUBUF loads in this test output are incorrect, their SOffset
				; should use the frame offset register, not the ABI stack pointer register. We
				; rely on the frame index argument of MUBUF stack accesses to survive until PEI
				; so we can fix up the SOffset to use the correct frame register in
				; eliminateFrameIndex. Some things like LocalStackSlotAllocation can lift the
				; frame index up into something (e.g. `v_add_nc_u32`) that we cannot fold back
				; into the MUBUF instruction, and so we end up emitting an incorrect offset.
				; Fixing this may involve adding stack access pseudos so that we don't have to
				; speculatively refer to the ABI stack pointer register at all.

	; An assert was hit when frame offset register was used to address FrameIndex.			; An assert was hit when frame offset register was used to address FrameIndex.
	define amdgpu_kernel void @kernel_background_evaluate(float addrspace(5)* %kg, <4 x i32> addrspace(1)* %input, <4 x float> addrspace(1)* %output, i32 %i) {			define amdgpu_kernel void @kernel_background_evaluate(float addrspace(5)* %kg, <4 x i32> addrspace(1)* %input, <4 x float> addrspace(1)* %output, i32 %i) {
	; GCN-LABEL: kernel_background_evaluate:			; GCN-LABEL: kernel_background_evaluate:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_load_dword s6, s[0:1], 0x24			; GCN-NEXT: s_load_dword s6, s[0:1], 0x24
	; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0			; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0
	; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1			; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1
	; GCN-NEXT: s_mov_b32 s38, -1			; GCN-NEXT: s_mov_b32 s38, -1
	; GCN-NEXT: s_mov_b32 s39, 0x31c16000			; GCN-NEXT: s_mov_b32 s39, 0x31c16000
	; GCN-NEXT: s_mov_b32 s33, s3			; GCN-NEXT: s_add_u32 s36, s36, s3
	; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]			; GCN-NEXT: s_addc_u32 s37, s37, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0x2000			; GCN-NEXT: v_mov_b32_e32 v1, 0x2000
	; GCN-NEXT: v_mov_b32_e32 v2, 0x4000			; GCN-NEXT: v_mov_b32_e32 v2, 0x4000
	; GCN-NEXT: v_mov_b32_e32 v3, 0			; GCN-NEXT: v_mov_b32_e32 v3, 0
	; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]
	; GCN-NEXT: v_mov_b32_e32 v4, 0x400000			; GCN-NEXT: v_mov_b32_e32 v4, 0x400000
	; GCN-NEXT: s_add_u32 s32, s33, 0xc0000			; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]
				; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]
				; GCN-NEXT: s_mov_b32 s32, 0xc0000
	; GCN-NEXT: v_add_nc_u32_e64 v32, 4, 0x4000			; GCN-NEXT: v_add_nc_u32_e64 v32, 4, 0x4000
	; GCN-NEXT: ; implicit-def: $vcc_hi			; GCN-NEXT: ; implicit-def: $vcc_hi
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+4
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s6			; GCN-NEXT: v_mov_b32_e32 v0, s6
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0			; GCN-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
	; GCN-NEXT: s_and_saveexec_b32 s0, vcc_lo			; GCN-NEXT: s_and_saveexec_b32 s0, vcc_lo
	; GCN-NEXT: s_cbranch_execz BB0_2			; GCN-NEXT: s_cbranch_execz BB0_2
	; GCN-NEXT: ; %bb.1: ; %if.then4.i			; GCN-NEXT: ; %bb.1: ; %if.then4.i
	; GCN-NEXT: buffer_load_dword v0, v32, s[36:39], s32 offen			; GCN-NEXT: buffer_load_dword v0, v32, s[36:39], s32 offen
	; GCN-NEXT: buffer_load_dword v1, v32, s[36:39], s32 offen offset:4			; GCN-NEXT: buffer_load_dword v1, v32, s[36:39], s32 offen offset:4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GCN-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GCN-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0			; GCN-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0
	; GCN-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0			; GCN-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0
	; GCN-NEXT: buffer_store_dword v0, v0, s[36:39], s33 offen			; GCN-NEXT: buffer_store_dword v0, v0, s[36:39], 0 offen
	; GCN-NEXT: BB0_2: ; %shader_eval_surface.exit			; GCN-NEXT: BB0_2: ; %shader_eval_surface.exit
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	entry:			entry:
	%sd = alloca < 1339 x i32>, align 16, addrspace(5)			%sd = alloca < 1339 x i32>, align 16, addrspace(5)
	%state = alloca <4 x i32>, align 4, addrspace(5)			%state = alloca <4 x i32>, align 4, addrspace(5)
	%rslt = call i32 @svm_eval_nodes(float addrspace(5)* %kg, <1339 x i32> addrspace(5)* %sd, <4 x i32> addrspace(5)* %state, i32 0, i32 4194304)			%rslt = call i32 @svm_eval_nodes(float addrspace(5)* %kg, <1339 x i32> addrspace(5)* %sd, <4 x i32> addrspace(5)* %state, i32 0, i32 4194304)
	%cmp = icmp eq i32 %rslt, 0			%cmp = icmp eq i32 %rslt, 0
	br i1 %cmp, label %shader_eval_surface.exit, label %if.then4.i			br i1 %cmp, label %shader_eval_surface.exit, label %if.then4.i
	Show All 18 Lines

llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s \| FileCheck -check-prefix=VI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji < %s \| FileCheck -check-prefix=VI %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -check-prefix=GFX9 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -check-prefix=GFX9 %s

; Make sure the stack is never realigned for entry functions.		; Make sure the stack is never realigned for entry functions.

define amdgpu_kernel void @max_alignment_128() #0 {		define amdgpu_kernel void @max_alignment_128() #0 {
; VI-LABEL: max_alignment_128:		; VI-LABEL: max_alignment_128:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_u32 s4, s4, s7
		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
		; VI-NEXT: s_add_u32 s0, s0, s7
		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:128
; VI-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:128
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
; VI-NEXT: .section .rodata,#alloc		; VI-NEXT: .section .rodata,#alloc
; VI-NEXT: .p2align 6		; VI-NEXT: .p2align 6
; VI-NEXT: .amdhsa_kernel max_alignment_128		; VI-NEXT: .amdhsa_kernel max_alignment_128
; VI-NEXT: .amdhsa_group_segment_fixed_size 0		; VI-NEXT: .amdhsa_group_segment_fixed_size 0
; VI-NEXT: .amdhsa_private_segment_fixed_size 256		; VI-NEXT: .amdhsa_private_segment_fixed_size 256
; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 25 Lines
; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0		; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0
; VI-NEXT: .amdhsa_exception_int_div_zero 0		; VI-NEXT: .amdhsa_exception_int_div_zero 0
; VI-NEXT: .end_amdhsa_kernel		; VI-NEXT: .end_amdhsa_kernel
; VI-NEXT: .text		; VI-NEXT: .text
;		;
; GFX9-LABEL: max_alignment_128:		; GFX9-LABEL: max_alignment_128:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7		; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7
; GFX9-NEXT: v_mov_b32_e32 v0, 9
; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0		; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:128		; GFX9-NEXT: s_add_u32 s0, s0, s7
		; GFX9-NEXT: s_addc_u32 s1, s1, 0
		; GFX9-NEXT: v_mov_b32_e32 v0, 9
		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:128
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
; GFX9-NEXT: .section .rodata,#alloc		; GFX9-NEXT: .section .rodata,#alloc
; GFX9-NEXT: .p2align 6		; GFX9-NEXT: .p2align 6
; GFX9-NEXT: .amdhsa_kernel max_alignment_128		; GFX9-NEXT: .amdhsa_kernel max_alignment_128
; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0		; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0
; GFX9-NEXT: .amdhsa_private_segment_fixed_size 256		; GFX9-NEXT: .amdhsa_private_segment_fixed_size 256
; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 31 Lines	; GFX9-NEXT: .text
store volatile i32 9, i32 addrspace(5)* %alloca.align, align 128		store volatile i32 9, i32 addrspace(5)* %alloca.align, align 128
ret void		ret void
}		}

define amdgpu_kernel void @stackrealign_attr() #1 {		define amdgpu_kernel void @stackrealign_attr() #1 {
; VI-LABEL: stackrealign_attr:		; VI-LABEL: stackrealign_attr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_u32 s4, s4, s7
		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
		; VI-NEXT: s_add_u32 s0, s0, s7
		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; VI-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:4
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
; VI-NEXT: .section .rodata,#alloc		; VI-NEXT: .section .rodata,#alloc
; VI-NEXT: .p2align 6		; VI-NEXT: .p2align 6
; VI-NEXT: .amdhsa_kernel stackrealign_attr		; VI-NEXT: .amdhsa_kernel stackrealign_attr
; VI-NEXT: .amdhsa_group_segment_fixed_size 0		; VI-NEXT: .amdhsa_group_segment_fixed_size 0
; VI-NEXT: .amdhsa_private_segment_fixed_size 8		; VI-NEXT: .amdhsa_private_segment_fixed_size 8
; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 25 Lines
; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0		; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0
; VI-NEXT: .amdhsa_exception_int_div_zero 0		; VI-NEXT: .amdhsa_exception_int_div_zero 0
; VI-NEXT: .end_amdhsa_kernel		; VI-NEXT: .end_amdhsa_kernel
; VI-NEXT: .text		; VI-NEXT: .text
;		;
; GFX9-LABEL: stackrealign_attr:		; GFX9-LABEL: stackrealign_attr:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7		; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7
; GFX9-NEXT: v_mov_b32_e32 v0, 9
; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0		; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:4		; GFX9-NEXT: s_add_u32 s0, s0, s7
		; GFX9-NEXT: s_addc_u32 s1, s1, 0
		; GFX9-NEXT: v_mov_b32_e32 v0, 9
		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
; GFX9-NEXT: .section .rodata,#alloc		; GFX9-NEXT: .section .rodata,#alloc
; GFX9-NEXT: .p2align 6		; GFX9-NEXT: .p2align 6
; GFX9-NEXT: .amdhsa_kernel stackrealign_attr		; GFX9-NEXT: .amdhsa_kernel stackrealign_attr
; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0		; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0
; GFX9-NEXT: .amdhsa_private_segment_fixed_size 8		; GFX9-NEXT: .amdhsa_private_segment_fixed_size 8
; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 31 Lines	; GFX9-NEXT: .text
store volatile i32 9, i32 addrspace(5)* %alloca.align, align 4		store volatile i32 9, i32 addrspace(5)* %alloca.align, align 4
ret void		ret void
}		}

define amdgpu_kernel void @alignstack_attr() #2 {		define amdgpu_kernel void @alignstack_attr() #2 {
; VI-LABEL: alignstack_attr:		; VI-LABEL: alignstack_attr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_u32 s4, s4, s7
		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
		; VI-NEXT: s_add_u32 s0, s0, s7
		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; VI-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:4
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
; VI-NEXT: .section .rodata,#alloc		; VI-NEXT: .section .rodata,#alloc
; VI-NEXT: .p2align 6		; VI-NEXT: .p2align 6
; VI-NEXT: .amdhsa_kernel alignstack_attr		; VI-NEXT: .amdhsa_kernel alignstack_attr
; VI-NEXT: .amdhsa_group_segment_fixed_size 0		; VI-NEXT: .amdhsa_group_segment_fixed_size 0
; VI-NEXT: .amdhsa_private_segment_fixed_size 128		; VI-NEXT: .amdhsa_private_segment_fixed_size 128
; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; VI-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; VI-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 25 Lines
; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0		; VI-NEXT: .amdhsa_exception_fp_ieee_inexact 0
; VI-NEXT: .amdhsa_exception_int_div_zero 0		; VI-NEXT: .amdhsa_exception_int_div_zero 0
; VI-NEXT: .end_amdhsa_kernel		; VI-NEXT: .end_amdhsa_kernel
; VI-NEXT: .text		; VI-NEXT: .text
;		;
; GFX9-LABEL: alignstack_attr:		; GFX9-LABEL: alignstack_attr:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7		; GFX9-NEXT: s_add_u32 flat_scratch_lo, s4, s7
; GFX9-NEXT: v_mov_b32_e32 v0, 9
; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0		; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s7 offset:4		; GFX9-NEXT: s_add_u32 s0, s0, s7
		; GFX9-NEXT: s_addc_u32 s1, s1, 0
		; GFX9-NEXT: v_mov_b32_e32 v0, 9
		; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
; GFX9-NEXT: .section .rodata,#alloc		; GFX9-NEXT: .section .rodata,#alloc
; GFX9-NEXT: .p2align 6		; GFX9-NEXT: .p2align 6
; GFX9-NEXT: .amdhsa_kernel alignstack_attr		; GFX9-NEXT: .amdhsa_kernel alignstack_attr
; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0		; GFX9-NEXT: .amdhsa_group_segment_fixed_size 0
; GFX9-NEXT: .amdhsa_private_segment_fixed_size 128		; GFX9-NEXT: .amdhsa_private_segment_fixed_size 128
; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1		; GFX9-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1
; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0		; GFX9-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
Show All 38 Lines

llvm/test/CodeGen/AMDGPU/stack-realign.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; Check that we properly realign the stack. While 4-byte access is all			; Check that we properly realign the stack. While 4-byte access is all
	; that is ever needed, some transformations rely on the known bits from the alignment of the pointer (e.g.			; that is ever needed, some transformations rely on the known bits from the alignment of the pointer (e.g.


	; 128 byte object			; 128 byte object
	; 4 byte emergency stack slot			; 4 byte emergency stack slot
	; = 144 bytes with padding between them			; = 144 bytes with padding between them

	; GCN-LABEL: {{^}}needs_align16_default_stack_align:			; GCN-LABEL: {{^}}needs_align16_default_stack_align:
	; GCN: s_sub_u32 [[SUB:s[0-9]+]], s32, s33
	; GCN-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, v0			; GCN-DAG: v_lshlrev_b32_e32 [[SCALED_IDX:v[0-9]+]], 4, v0
	; GCN-DAG: v_lshrrev_b32_e64 [[FRAMEDIFF:v[0-9]+]], 6, [[SUB]]			; GCN-DAG: v_lshrrev_b32_e64 [[FRAMEDIFF:v[0-9]+]], 6, s32
	; GCN: v_add_u32_e32 [[FI:v[0-9]+]], vcc, [[FRAMEDIFF]], [[SCALED_IDX]]			; GCN: v_add_u32_e32 [[FI:v[0-9]+]], vcc, [[FRAMEDIFF]], [[SCALED_IDX]]

	; GCN-NOT: s32			; GCN-NOT: s32

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: v_or_b32_e32 v{{[0-9]+}}, 12			; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

	; GCN-NOT: s32			; GCN-NOT: s32

	; GCN: ; ScratchSize: 144			; GCN: ; ScratchSize: 144
	define void @needs_align16_default_stack_align(i32 %idx) #0 {			define void @needs_align16_default_stack_align(i32 %idx) #0 {
	%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)			%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}needs_align16_stack_align4:			; GCN-LABEL: {{^}}needs_align16_stack_align4:
	; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x3c0{{$}}			; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x3c0{{$}}
	; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xfffffc00			; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xfffffc00
	; GCN: s_add_u32 s32, s32, 0x2800{{$}}

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: v_or_b32_e32 v{{[0-9]+}}, 12			; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: s_add_u32 s32, s32, 0x2800{{$}}
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
				; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

	; GCN: s_sub_u32 s32, s32, 0x2800			; GCN: s_sub_u32 s32, s32, 0x2800

	; GCN: ; ScratchSize: 160			; GCN: ; ScratchSize: 160
	define void @needs_align16_stack_align4(i32 %idx) #2 {			define void @needs_align16_stack_align4(i32 %idx) #2 {
	%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)			%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}needs_align32:			; GCN-LABEL: {{^}}needs_align32:
	; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x7c0{{$}}			; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x7c0{{$}}
	; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xfffff800			; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xfffff800
	; GCN: s_add_u32 s32, s32, 0x3000{{$}}

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: v_or_b32_e32 v{{[0-9]+}}, 12			; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: s_add_u32 s32, s32, 0x3000{{$}}
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
				; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

	; GCN: s_sub_u32 s32, s32, 0x3000			; GCN: s_sub_u32 s32, s32, 0x3000

	; GCN: ; ScratchSize: 192			; GCN: ; ScratchSize: 192
	define void @needs_align32(i32 %idx) #0 {			define void @needs_align32(i32 %idx) #0 {
	%alloca.align16 = alloca [8 x <4 x i32>], align 32, addrspace(5)			%alloca.align16 = alloca [8 x <4 x i32>], align 32, addrspace(5)
	%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 32			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 32
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}force_realign4:			; GCN-LABEL: {{^}}force_realign4:
	; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0xc0{{$}}			; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0xc0{{$}}
	; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xffffff00			; GCN: s_and_b32 s34, [[SCRATCH_REG]], 0xffffff00
	; GCN: s_add_u32 s32, s32, 0xd00{{$}}			; GCN: s_add_u32 s32, s32, 0xd00{{$}}

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], s33 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: s_sub_u32 s32, s32, 0xd00			; GCN: s_sub_u32 s32, s32, 0xd00

	; GCN: ; ScratchSize: 52			; GCN: ; ScratchSize: 52
	define void @force_realign4(i32 %idx) #1 {			define void @force_realign4(i32 %idx) #1 {
	%alloca.align16 = alloca [8 x i32], align 4, addrspace(5)			%alloca.align16 = alloca [8 x i32], align 4, addrspace(5)
	%gep0 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca.align16, i32 0, i32 %idx
	store volatile i32 3, i32 addrspace(5)* %gep0, align 4			store volatile i32 3, i32 addrspace(5)* %gep0, align 4
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}kernel_call_align16_from_8:			; GCN-LABEL: {{^}}kernel_call_align16_from_8:
	; GCN: s_mov_b32 s33, s7{{$}}			; GCN: s_movk_i32 s32, 0x400{{$}}
	; GCN-NEXT: s_add_u32 s32, s33, 0x400{{$}}
	; GCN-NOT: s32			; GCN-NOT: s32
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kernel_call_align16_from_8() #0 {			define amdgpu_kernel void @kernel_call_align16_from_8() #0 {
	%alloca = alloca i32, align 4, addrspace(5)			%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 2, i32 addrspace(5)* %alloca			store volatile i32 2, i32 addrspace(5)* %alloca
	call void @needs_align16_default_stack_align(i32 1)			call void @needs_align16_default_stack_align(i32 1)
	ret void			ret void
	}			}

	; The call sequence should keep the stack on call aligned to 4			; The call sequence should keep the stack on call aligned to 4
	; GCN-LABEL: {{^}}kernel_call_align16_from_5:			; GCN-LABEL: {{^}}kernel_call_align16_from_5:
	; GCN: s_mov_b32 s33, s7{{$}}			; GCN: s_movk_i32 s32, 0x400
	; GCN-NEXT: s_add_u32 s32, s33, 0x400
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kernel_call_align16_from_5() {			define amdgpu_kernel void @kernel_call_align16_from_5() {
	%alloca0 = alloca i8, align 1, addrspace(5)			%alloca0 = alloca i8, align 1, addrspace(5)
	store volatile i8 2, i8 addrspace(5)* %alloca0			store volatile i8 2, i8 addrspace(5)* %alloca0

	call void @needs_align16_default_stack_align(i32 1)			call void @needs_align16_default_stack_align(i32 1)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}kernel_call_align4_from_5:			; GCN-LABEL: {{^}}kernel_call_align4_from_5:
	; GCN: s_mov_b32 s33, s7{{$}}			; GCN: s_movk_i32 s32, 0x400
	; GCN: s_add_u32 s32, s33, 0x400
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kernel_call_align4_from_5() {			define amdgpu_kernel void @kernel_call_align4_from_5() {
	%alloca0 = alloca i8, align 1, addrspace(5)			%alloca0 = alloca i8, align 1, addrspace(5)
	store volatile i8 2, i8 addrspace(5)* %alloca0			store volatile i8 2, i8 addrspace(5)* %alloca0

	call void @needs_align16_stack_align4(i32 1)			call void @needs_align16_stack_align4(i32 1)
	ret void			ret void
	}			}
	Show All 30 Lines

llvm/test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir

	# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s
	---			---

	# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}			# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}
	# CHECK: stack:			# CHECK: stack:
	# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: default,			# CHECK-NEXT: stack-id: default,

	# CHECK: - { id: 1, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# CHECK: - { id: 1, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: sgpr-spill,			# CHECK-NEXT: stack-id: sgpr-spill,

	# CHECK: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# CHECK: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# CHECK: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# CHECK: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)

	# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr6, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.1, addrspace 5)			# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.1, addrspace 5)
	# CHECK: $sgpr6 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.1, addrspace 5)			# CHECK: $sgpr5 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.1, addrspace 5)

	name: no_merge_sgpr_vgpr_spill_slot			name: no_merge_sgpr_vgpr_spill_slot
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4			frameOffsetReg: $sgpr4
	frameOffsetReg: $sgpr5
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	%0:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, 0, implicit $flat_scr, implicit $exec			%0:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, 0, implicit $flat_scr, implicit $exec
	%2:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, 0, implicit $flat_scr, implicit $exec			%2:vgpr_32 = FLAT_LOAD_DWORD undef $vgpr0_vgpr1, 0, 0, 0, 0, implicit $flat_scr, implicit $exec
	S_NOP 0, implicit %0			S_NOP 0, implicit %0
	%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0			%1:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0
	%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0			%3:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0
	S_NOP 0, implicit %1			S_NOP 0, implicit %1
	...			...

llvm/test/CodeGen/AMDGPU/store-hi16.ll

Show First 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	entry:
%gep = getelementptr inbounds i8, i8* %out, i64 -4095		%gep = getelementptr inbounds i8, i8* %out, i64 -4095
store i8 %trunc, i8* %gep		store i8 %trunc, i8* %gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16:		; GCN-LABEL: {{^}}store_private_hi_v2i16:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI: buffer_store_short v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16(i16 addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2i16(i16 addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
store i16 %hi, i16 addrspace(5)* %out		store i16 %hi, i16 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2f16:		; GCN-LABEL: {{^}}store_private_hi_v2f16:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI: buffer_store_short v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2f16(half addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2f16(half addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x half>		%value = bitcast i32 %arg to <2 x half>
%hi = extractelement <2 x half> %value, i32 1		%hi = extractelement <2 x half> %value, i32 1
store half %hi, half addrspace(5)* %out		store half %hi, half addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_i32_shift:		; GCN-LABEL: {{^}}store_private_hi_i32_shift:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_short v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_i32_shift(i16 addrspace(5)* %out, i32 %value) #0 {		define void @store_private_hi_i32_shift(i16 addrspace(5)* %out, i32 %value) #0 {
entry:		entry:
%hi32 = lshr i32 %value, 16		%hi32 = lshr i32 %value, 16
%hi = trunc i32 %hi32 to i16		%hi = trunc i32 %hi32 to i16
store i16 %hi, i16 addrspace(5)* %out		store i16 %hi, i16 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16_i8:		; GCN-LABEL: {{^}}store_private_hi_v2i16_i8:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_i8(i8 addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2i16_i8(i8 addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%trunc = trunc i16 %hi to i8		%trunc = trunc i16 %hi to i8
store i8 %trunc, i8 addrspace(5)* %out		store i8 %trunc, i8 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_i8_shift:		; GCN-LABEL: {{^}}store_private_hi_i8_shift:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], s33 offen{{$}}		; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], 0 offen{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], s33 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_i8_shift(i8 addrspace(5)* %out, i32 %value) #0 {		define void @store_private_hi_i8_shift(i8 addrspace(5)* %out, i32 %value) #0 {
entry:		entry:
%hi32 = lshr i32 %value, 16		%hi32 = lshr i32 %value, 16
%hi = trunc i32 %hi32 to i8		%hi = trunc i32 %hi32 to i8
store i8 %hi, i8 addrspace(5)* %out		store i8 %hi, i8 addrspace(5)* %out
Show All 18 Lines	entry:
ret void		ret void
}		}



; GCN-LABEL: {{^}}store_private_hi_v2i16_nooff:		; GCN-LABEL: {{^}}store_private_hi_v2i16_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v0, off, s[0:3], s33{{$}}		; GFX900-NEXT: buffer_store_short_d16_hi v0, off, s[0:3], 0{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
; NO-D16-HI-NEXT: buffer_store_short v0, off, s[0:3], s33{{$}}		; NO-D16-HI-NEXT: buffer_store_short v0, off, s[0:3], 0{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_nooff(i32 %arg) #0 {		define void @store_private_hi_v2i16_nooff(i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
store volatile i16 %hi, i16 addrspace(5)* null		store volatile i16 %hi, i16 addrspace(5)* null
ret void		ret void
}		}


; GCN-LABEL: {{^}}store_private_hi_v2i16_i8_nooff:		; GCN-LABEL: {{^}}store_private_hi_v2i16_i8_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v0, off, s[0:3], s33{{$}}		; GFX900-NEXT: buffer_store_byte_d16_hi v0, off, s[0:3], 0{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v0, 16, v0		; NO-D16-HI: v_lshrrev_b32_e32 v0, 16, v0
; NO-D16-HI: buffer_store_byte v0, off, s[0:3], s33{{$}}		; NO-D16-HI: buffer_store_byte v0, off, s[0:3], 0{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_i8_nooff(i32 %arg) #0 {		define void @store_private_hi_v2i16_i8_nooff(i32 %arg) #0 {
entry:		entry:
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%trunc = trunc i16 %hi to i8		%trunc = trunc i16 %hi to i8
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/subreg-split-live-in-error.mir

	Show All 35 Lines
	#			#
	# This test exposes this scenario which caused previously caused an assert			# This test exposes this scenario which caused previously caused an assert

	---			---
	name: _amdgpu_ps_main			name: _amdgpu_ps_main
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	liveins:			liveins:
	- { reg: '$vgpr2', virtual-reg: '%0' }			- { reg: '$vgpr2', virtual-reg: '%0' }
	- { reg: '$vgpr3', virtual-reg: '%1' }			- { reg: '$vgpr3', virtual-reg: '%1' }
	- { reg: '$vgpr4', virtual-reg: '%2' }			- { reg: '$vgpr4', virtual-reg: '%2' }
	body: \|			body: \|
	bb.0:			bb.0:
	successors: %bb.1(0x40000000), %bb.2(0x40000000)			successors: %bb.1(0x40000000), %bb.2(0x40000000)
	▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/subvector-test.mir

	# RUN: llc -march=amdgcn -mcpu=gfx1010 -start-before=greedy -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -start-before=greedy -verify-machineinstrs -o - %s \| FileCheck -check-prefix=GCN %s
	...			...
	# GCN-LABEL: {{^}}"subvector-basic-bb"			# GCN-LABEL: {{^}}"subvector-basic-bb"
	# GCN: s_subvector_loop_begin [[RS:s[0-9]]], BB0_2			# GCN: s_subvector_loop_begin [[RS:s[0-9]]], BB0_2
	# GCN: s_subvector_loop_end [[RS]], BB0_1			# GCN: s_subvector_loop_end [[RS]], BB0_1
	name: subvector-basic-bb			name: subvector-basic-bb
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	scratchWaveOffsetReg: $sgpr4
	frameOffsetReg: $sgpr5			frameOffsetReg: $sgpr5
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr0_sgpr1			liveins: $sgpr0_sgpr1
	successors: %bb.1, %bb.2			successors: %bb.1, %bb.2

	%1:sgpr_64 = COPY $sgpr0_sgpr1			%1:sgpr_64 = COPY $sgpr0_sgpr1
	Show All 19 Lines

llvm/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll

	Show All 16 Lines
	; GCN-DAG: s_mov_b32 s[[DESC0:[0-9]+]], SCRATCH_RSRC_DWORD0			; GCN-DAG: s_mov_b32 s[[DESC0:[0-9]+]], SCRATCH_RSRC_DWORD0
	; GCN-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1			; GCN-DAG: s_mov_b32 s{{[0-9]+}}, SCRATCH_RSRC_DWORD1
	; GCN-DAG: s_mov_b32 s{{[0-9]+}}, -1			; GCN-DAG: s_mov_b32 s{{[0-9]+}}, -1
	; SI-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe8f000			; SI-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe8f000
	; VI-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe80000			; VI-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe80000
	; GFX9-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe00000			; GFX9-DAG: s_mov_b32 s[[DESC3:[0-9]+]], 0xe00000

	; OFFREG is offset system SGPR			; OFFREG is offset system SGPR
	; GCN: buffer_store_dword {{v[0-9]+}}, off, s{{\[}}[[DESC0]]:[[DESC3]]], s12 offset:{{[0-9]+}} ; 4-byte Folded Spill			; GCN: buffer_store_dword {{v[0-9]+}}, off, s{{\[}}[[DESC0]]:[[DESC3]]], 0 offset:{{[0-9]+}} ; 4-byte Folded Spill
	; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[DESC0]]:[[DESC3]]], s12 offset:{{[0-9]+}} ; 4-byte Folded Reload			; GCN: buffer_load_dword v{{[0-9]+}}, off, s{{\[}}[[DESC0]]:[[DESC3]]], 0 offset:{{[0-9]+}} ; 4-byte Folded Reload
	; GCN: NumVgprs: 256			; GCN: NumVgprs: 256
	; GCN: ScratchSize: 1536			; GCN: ScratchSize: 1536

	define amdgpu_vs void @main([9 x <4 x i32>] addrspace(4)* inreg %arg, [17 x <4 x i32>] addrspace(4)* inreg %arg1, [17 x <4 x i32>] addrspace(4)* inreg %arg2, [34 x <8 x i32>] addrspace(4)* inreg %arg3, [16 x <4 x i32>] addrspace(4)* inreg %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10) #0 {			define amdgpu_vs void @main([9 x <4 x i32>] addrspace(4)* inreg %arg, [17 x <4 x i32>] addrspace(4)* inreg %arg1, [17 x <4 x i32>] addrspace(4)* inreg %arg2, [34 x <8 x i32>] addrspace(4)* inreg %arg3, [16 x <4 x i32>] addrspace(4)* inreg %arg4, i32 inreg %arg5, i32 inreg %arg6, i32 %arg7, i32 %arg8, i32 %arg9, i32 %arg10) #0 {
	bb:			bb:
	%tmp = getelementptr [17 x <4 x i32>], [17 x <4 x i32>] addrspace(4)* %arg1, i64 0, i64 0			%tmp = getelementptr [17 x <4 x i32>], [17 x <4 x i32>] addrspace(4)* %arg1, i64 0, i64 0
	%tmp11 = load <4 x i32>, <4 x i32> addrspace(4)* %tmp, align 16, !tbaa !0			%tmp11 = load <4 x i32>, <4 x i32> addrspace(4)* %tmp, align 16, !tbaa !0
	%tmp12 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %tmp11, i32 0, i32 0)			%tmp12 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %tmp11, i32 0, i32 0)
	▲ Show 20 Lines • Show All 466 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/virtregrewrite-undef-identity-copy.mir

	Show All 18 Lines
	name: undef_identity_copy			name: undef_identity_copy
	tracksRegLiveness: true			tracksRegLiveness: true
	frameInfo:			frameInfo:
	maxAlignment: 4			maxAlignment: 4
	hasCalls: true			hasCalls: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr95'
	frameOffsetReg: '$sgpr95'			frameOffsetReg: '$sgpr95'
	stackPtrOffsetReg: '$sgpr32'			stackPtrOffsetReg: '$sgpr32'
	body: \|			body: \|
	bb.0:			bb.0:
	; CHECK-LABEL: name: undef_identity_copy			; CHECK-LABEL: name: undef_identity_copy
	; CHECK: renamable $vgpr32_vgpr33_vgpr34_vgpr35 = FLAT_LOAD_DWORDX4 undef renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)			; CHECK: renamable $vgpr32_vgpr33_vgpr34_vgpr35 = FLAT_LOAD_DWORDX4 undef renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec, implicit $flat_scr :: (load 16, addrspace 1)
	; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc			; CHECK: renamable $sgpr6_sgpr7 = SI_PC_ADD_REL_OFFSET target-flags(amdgpu-rel32-lo) @foo + 4, target-flags(amdgpu-rel32-hi) @foo + 4, implicit-def dead $scc
	; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95			; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sgpr32, implicit $sgpr32, implicit $sgpr95
	Show All 34 Lines

llvm/test/CodeGen/AMDGPU/wqm.ll

	Show First 20 Lines • Show All 687 Lines • ▼ Show 20 Lines
	;			;
	; CHECK-LABEL: {{^}}test_alloca:			; CHECK-LABEL: {{^}}test_alloca:
	; CHECK: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec			; CHECK: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec

	; CHECK: s_and_b64 exec, exec, [[LIVE]]			; CHECK: s_and_b64 exec, exec, [[LIVE]]
	; CHECK: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0			; CHECK: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec
	; CHECK: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offset:4{{$}}			; CHECK: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:4{{$}}
	; CHECK: s_and_b64 exec, exec, [[LIVE]]			; CHECK: s_and_b64 exec, exec, [[LIVE]]
	; CHECK: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 idxen			; CHECK: buffer_store_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 idxen
	; CHECK: s_wqm_b64 exec, exec			; CHECK: s_wqm_b64 exec, exec
	; CHECK: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen			; CHECK: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; CHECK: s_and_b64 exec, exec, [[LIVE]]			; CHECK: s_and_b64 exec, exec, [[LIVE]]
	; CHECK: image_sample			; CHECK: image_sample
	; CHECK: buffer_store_dwordx4			; CHECK: buffer_store_dwordx4
	define amdgpu_ps void @test_alloca(float %data, i32 %a, i32 %idx) nounwind {			define amdgpu_ps void @test_alloca(float %data, i32 %a, i32 %idx) nounwind {
	entry:			entry:
	%array = alloca [32 x i32], align 4, addrspace(5)			%array = alloca [32 x i32], align 4, addrspace(5)

	▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved.ll

Show All 38 Lines	entry:
%tmp100 = call <2 x float> @llvm.amdgcn.raw.buffer.load.v2f32(<4 x i32> %tmp14, i32 0, i32 0, i32 0)		%tmp100 = call <2 x float> @llvm.amdgcn.raw.buffer.load.v2f32(<4 x i32> %tmp14, i32 0, i32 0, i32 0)
%tmp101 = bitcast <2 x float> %tmp100 to <2 x i32>		%tmp101 = bitcast <2 x float> %tmp100 to <2 x i32>
%tmp102 = extractelement <2 x i32> %tmp101, i32 0		%tmp102 = extractelement <2 x i32> %tmp101, i32 0
%tmp105 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp102, i32 0)		%tmp105 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp102, i32 0)

; GFX9: v_mov_b32_dpp v[[FIRST_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9: v_mov_b32_dpp v[[FIRST_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9: v_add_u32_e32 v[[FIRST_ADD:[0-9]+]], v{{[0-9]+}}, v[[FIRST_MOV]]		; GFX9: v_add_u32_e32 v[[FIRST_ADD:[0-9]+]], v{{[0-9]+}}, v[[FIRST_MOV]]
; GFX9: v_mov_b32_e32 v[[FIRST:[0-9]+]], v[[FIRST_ADD]]		; GFX9: v_mov_b32_e32 v[[FIRST:[0-9]+]], v[[FIRST_ADD]]
; GFX9-O0: buffer_store_dword v[[FIRST]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, s[[FIRST_SGPR_OFFSET:[0-9]+]] offset:[[FIRST_IMM_OFFSET:[0-9]+]]		; GFX9-O0: buffer_store_dword v[[FIRST]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, 0 offset:[[FIRST_IMM_OFFSET:[0-9]+]]
%tmp120 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp105, i32 323, i32 12, i32 15, i1 false)		%tmp120 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp105, i32 323, i32 12, i32 15, i1 false)
%tmp121 = add i32 %tmp105, %tmp120		%tmp121 = add i32 %tmp105, %tmp120
%tmp122 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp121)		%tmp122 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp121)

%cond = icmp eq i32 %arg, 0		%cond = icmp eq i32 %arg, 0
br i1 %cond, label %if, label %merge		br i1 %cond, label %if, label %merge
if:		if:
%tmp103 = extractelement <2 x i32> %tmp101, i32 1		%tmp103 = extractelement <2 x i32> %tmp101, i32 1
%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp103, i32 0)		%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp103, i32 0)

; GFX9: v_mov_b32_dpp v[[SECOND_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf		; GFX9: v_mov_b32_dpp v[[SECOND_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
; GFX9: v_add_u32_e32 v[[SECOND_ADD:[0-9]+]], v{{[0-9]+}}, v[[SECOND_MOV]]		; GFX9: v_add_u32_e32 v[[SECOND_ADD:[0-9]+]], v{{[0-9]+}}, v[[SECOND_MOV]]
; GFX9: v_mov_b32_e32 v[[SECOND:[0-9]+]], v[[SECOND_ADD]]		; GFX9: v_mov_b32_e32 v[[SECOND:[0-9]+]], v[[SECOND_ADD]]
; GFX9-O0: buffer_store_dword v[[SECOND]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, s[[SECOND_SGPR_OFFSET:[0-9]+]] offset:[[SECOND_IMM_OFFSET:[0-9]+]]		; GFX9-O0: buffer_store_dword v[[SECOND]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, 0 offset:[[SECOND_IMM_OFFSET:[0-9]+]]
%tmp135 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp107, i32 323, i32 12, i32 15, i1 false)		%tmp135 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp107, i32 323, i32 12, i32 15, i1 false)
%tmp136 = add i32 %tmp107, %tmp135		%tmp136 = add i32 %tmp107, %tmp135
%tmp137 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp136)		%tmp137 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp136)
br label %merge		br label %merge

merge:		merge:
%merge_value = phi i32 [ 0, %entry ], [%tmp137, %if ]		%merge_value = phi i32 [ 0, %entry ], [%tmp137, %if ]
; GFX9-O3: v_cmp_eq_u32_e32 vcc, v[[FIRST]], v[[SECOND]]		; GFX9-O3: v_cmp_eq_u32_e32 vcc, v[[FIRST]], v[[SECOND]]
; GFX9-O0: buffer_load_dword v[[SECOND:[0-9]+]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, s[[SECOND_SGPR_OFFSET]] offset:[[SECOND_IMM_OFFSET]]		; GFX9-O0: buffer_load_dword v[[SECOND:[0-9]+]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, 0 offset:[[SECOND_IMM_OFFSET]]
; GFX9-O0: buffer_load_dword v[[FIRST:[0-9]+]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, s[[FIRST_SGPR_OFFSET]] offset:[[FIRST_IMM_OFFSET]]		; GFX9-O0: buffer_load_dword v[[FIRST:[0-9]+]], off, s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, 0 offset:[[FIRST_IMM_OFFSET]]
; GFX9-O0: v_cmp_eq_u32_e64 s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, v[[FIRST]], v[[SECOND]]		; GFX9-O0: v_cmp_eq_u32_e64 s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, v[[FIRST]], v[[SECOND]]
%tmp138 = icmp eq i32 %tmp122, %merge_value		%tmp138 = icmp eq i32 %tmp122, %merge_value
%tmp139 = sext i1 %tmp138 to i32		%tmp139 = sext i1 %tmp138 to i32
%tmp140 = shl nsw i32 %tmp139, 1		%tmp140 = shl nsw i32 %tmp139, 1
%tmp141 = and i32 %tmp140, 2		%tmp141 = and i32 %tmp140, 2
%tmp145 = bitcast i32 %tmp141 to float		%tmp145 = bitcast i32 %tmp141 to float
call void @llvm.amdgcn.raw.buffer.store.f32(float %tmp145, <4 x i32> %tmp14, i32 4, i32 0, i32 0)		call void @llvm.amdgcn.raw.buffer.store.f32(float %tmp145, <4 x i32> %tmp14, i32 4, i32 0, i32 0)
ret void		ret void
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-no-ir.mir

	# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=FULL,ALL %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=FULL,ALL %s
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -simplify-mir -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=SIMPLE,ALL %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -run-pass=none -simplify-mir -verify-machineinstrs %s -o - \| FileCheck -check-prefixes=SIMPLE,ALL %s


	---			---
	# ALL-LABEL: name: kernel0			# ALL-LABEL: name: kernel0
	# FULL: machineFunctionInfo:			# FULL: machineFunctionInfo:
	# FULL-NEXT: explicitKernArgSize: 128			# FULL-NEXT: explicitKernArgSize: 128
	# FULL-NEXT: maxKernArgAlign: 64			# FULL-NEXT: maxKernArgAlign: 64
	# FULL-NEXT: ldsSize: 2048			# FULL-NEXT: ldsSize: 2048
	# FULL-NEXT: isEntryFunction: true			# FULL-NEXT: isEntryFunction: true
	# FULL-NEXT: noSignedZerosFPMath: false			# FULL-NEXT: noSignedZerosFPMath: false
	# FULL-NEXT: memoryBound: true			# FULL-NEXT: memoryBound: true
	# FULL-NEXT: waveLimiter: true			# FULL-NEXT: waveLimiter: true
	# FULL-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'			# FULL-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'
	# FULL-NEXT: scratchWaveOffsetReg: '$sgpr12'
	# FULL-NEXT: frameOffsetReg: '$sgpr12'			# FULL-NEXT: frameOffsetReg: '$sgpr12'
	# FULL-NEXT: stackPtrOffsetReg: '$sgpr13'			# FULL-NEXT: stackPtrOffsetReg: '$sgpr13'
	# FULL-NEXT: argumentInfo:			# FULL-NEXT: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			# FULL-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	# FULL-NEXT: workGroupIDX: { reg: '$sgpr6' }			# FULL-NEXT: workGroupIDX: { reg: '$sgpr6' }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }			# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	# FULL-NEXT: workItemIDX: { reg: '$vgpr0' }			# FULL-NEXT: workItemIDX: { reg: '$vgpr0' }
	Show All 10 Lines
	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: explicitKernArgSize: 128			# SIMPLE-NEXT: explicitKernArgSize: 128
	# SIMPLE-NEXT: maxKernArgAlign: 64			# SIMPLE-NEXT: maxKernArgAlign: 64
	# SIMPLE-NEXT: ldsSize: 2048			# SIMPLE-NEXT: ldsSize: 2048
	# SIMPLE-NEXT: isEntryFunction: true			# SIMPLE-NEXT: isEntryFunction: true
	# SIMPLE-NEXT: memoryBound: true			# SIMPLE-NEXT: memoryBound: true
	# SIMPLE-NEXT: waveLimiter: true			# SIMPLE-NEXT: waveLimiter: true
	# SIMPLE-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'			# SIMPLE-NEXT: scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'
	# SIMPLE-NEXT: scratchWaveOffsetReg: '$sgpr12'
	# SIMPLE-NEXT: frameOffsetReg: '$sgpr12'			# SIMPLE-NEXT: frameOffsetReg: '$sgpr12'
	# SIMPLE-NEXT: stackPtrOffsetReg: '$sgpr13'			# SIMPLE-NEXT: stackPtrOffsetReg: '$sgpr13'
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			# SIMPLE-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	# SIMPLE-NEXT: workGroupIDX: { reg: '$sgpr6' }			# SIMPLE-NEXT: workGroupIDX: { reg: '$sgpr6' }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }			# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	# SIMPLE-NEXT: workItemIDX: { reg: '$vgpr0' }			# SIMPLE-NEXT: workItemIDX: { reg: '$vgpr0' }
	# SIMPLE-NEXT: body:			# SIMPLE-NEXT: body:
	name: kernel0			name: kernel0
	machineFunctionInfo:			machineFunctionInfo:
	explicitKernArgSize: 128			explicitKernArgSize: 128
	maxKernArgAlign: 64			maxKernArgAlign: 64
	ldsSize: 2048			ldsSize: 2048
	isEntryFunction: true			isEntryFunction: true
	noSignedZerosFPMath: false			noSignedZerosFPMath: false
	memoryBound: true			memoryBound: true
	waveLimiter: true			waveLimiter: true
	scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'			scratchRSrcReg: '$sgpr8_sgpr9_sgpr10_sgpr11'
	scratchWaveOffsetReg: '$sgpr12'
	frameOffsetReg: '$sgpr12'			frameOffsetReg: '$sgpr12'
	stackPtrOffsetReg: '$sgpr13'			stackPtrOffsetReg: '$sgpr13'
	argumentInfo:			argumentInfo:
	privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	workGroupIDX: { reg: '$sgpr6' }			workGroupIDX: { reg: '$sgpr6' }
	privateSegmentWaveByteOffset: { reg: '$sgpr7' }			privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	workItemIDX: { reg: '$vgpr0' }			workItemIDX: { reg: '$vgpr0' }
	Show All 10 Lines
	# FULL-NEXT: explicitKernArgSize: 0			# FULL-NEXT: explicitKernArgSize: 0
	# FULL-NEXT: maxKernArgAlign: 1			# FULL-NEXT: maxKernArgAlign: 1
	# FULL-NEXT: ldsSize: 0			# FULL-NEXT: ldsSize: 0
	# FULL-NEXT: isEntryFunction: false			# FULL-NEXT: isEntryFunction: false
	# FULL-NEXT: noSignedZerosFPMath: false			# FULL-NEXT: noSignedZerosFPMath: false
	# FULL-NEXT: memoryBound: false			# FULL-NEXT: memoryBound: false
	# FULL-NEXT: waveLimiter: false			# FULL-NEXT: waveLimiter: false
	# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'			# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
	# FULL-NEXT: scratchWaveOffsetReg: '$scratch_wave_offset_reg'
	# FULL-NEXT: frameOffsetReg: '$fp_reg'			# FULL-NEXT: frameOffsetReg: '$fp_reg'
	# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'			# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'
	# FULL-NEXT: argumentInfo:			# FULL-NEXT: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# FULL-NEXT: mode:			# FULL-NEXT: mode:
	# FULL-NEXT: ieee: true			# FULL-NEXT: ieee: true
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# SIMPLE-NEXT: body:			# SIMPLE-NEXT: body:

	name: no_mfi			name: no_mfi
	body: \|			body: \|
	bb.0:			bb.0:
	S_ENDPGM 0			S_ENDPGM 0

	...			...

	---			---
	# ALL-LABEL: name: empty_mfi			# ALL-LABEL: name: empty_mfi
	# FULL: machineFunctionInfo:			# FULL: machineFunctionInfo:
	# FULL-NEXT: explicitKernArgSize: 0			# FULL-NEXT: explicitKernArgSize: 0
	# FULL-NEXT: maxKernArgAlign: 1			# FULL-NEXT: maxKernArgAlign: 1
	# FULL-NEXT: ldsSize: 0			# FULL-NEXT: ldsSize: 0
	# FULL-NEXT: isEntryFunction: false			# FULL-NEXT: isEntryFunction: false
	# FULL-NEXT: noSignedZerosFPMath: false			# FULL-NEXT: noSignedZerosFPMath: false
	# FULL-NEXT: memoryBound: false			# FULL-NEXT: memoryBound: false
	# FULL-NEXT: waveLimiter: false			# FULL-NEXT: waveLimiter: false
	# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'			# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
	# FULL-NEXT: scratchWaveOffsetReg: '$scratch_wave_offset_reg'
	# FULL-NEXT: frameOffsetReg: '$fp_reg'			# FULL-NEXT: frameOffsetReg: '$fp_reg'
	# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'			# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'
	# FULL-NEXT: argumentInfo:			# FULL-NEXT: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# FULL-NEXT: mode:			# FULL-NEXT: mode:
	# FULL-NEXT: ieee: true			# FULL-NEXT: ieee: true
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# SIMPLE-NEXT: body:			# SIMPLE-NEXT: body:

	name: empty_mfi			name: empty_mfi
	machineFunctionInfo:			machineFunctionInfo:
	body: \|			body: \|
	bb.0:			bb.0:
	S_ENDPGM 0			S_ENDPGM 0

	...			...

	---			---
	# ALL-LABEL: name: empty_mfi_entry_func			# ALL-LABEL: name: empty_mfi_entry_func
	# FULL: machineFunctionInfo:			# FULL: machineFunctionInfo:
	# FULL-NEXT: explicitKernArgSize: 0			# FULL-NEXT: explicitKernArgSize: 0
	# FULL-NEXT: maxKernArgAlign: 1			# FULL-NEXT: maxKernArgAlign: 1
	# FULL-NEXT: ldsSize: 0			# FULL-NEXT: ldsSize: 0
	# FULL-NEXT: isEntryFunction: true			# FULL-NEXT: isEntryFunction: true
	# FULL-NEXT: noSignedZerosFPMath: false			# FULL-NEXT: noSignedZerosFPMath: false
	# FULL-NEXT: memoryBound: false			# FULL-NEXT: memoryBound: false
	# FULL-NEXT: waveLimiter: false			# FULL-NEXT: waveLimiter: false
	# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'			# FULL-NEXT: scratchRSrcReg: '$private_rsrc_reg'
	# FULL-NEXT: scratchWaveOffsetReg: '$scratch_wave_offset_reg'
	# FULL-NEXT: frameOffsetReg: '$fp_reg'			# FULL-NEXT: frameOffsetReg: '$fp_reg'
	# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'			# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'
	# FULL-NEXT: argumentInfo:			# FULL-NEXT: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# FULL-NEXT: mode:			# FULL-NEXT: mode:
	# FULL-NEXT: ieee: true			# FULL-NEXT: ieee: true
	# FULL-NEXT: dx10-clamp: true			# FULL-NEXT: dx10-clamp: true
	# FULL-NEXT: fp32-input-denormals: true			# FULL-NEXT: fp32-input-denormals: true
	# FULL-NEXT: fp32-output-denormals: true			# FULL-NEXT: fp32-output-denormals: true
	# FULL-NEXT: fp64-fp16-input-denormals: true			# FULL-NEXT: fp64-fp16-input-denormals: true
	# FULL-NEXT: fp64-fp16-output-denormals: true			# FULL-NEXT: fp64-fp16-output-denormals: true
	# FULL-NEXT: highBitsOf32BitAddress: 0			# FULL-NEXT: highBitsOf32BitAddress: 0
	# FULL-NEXT: body:			# FULL-NEXT: body:

	# SIMPLE: machineFunctionInfo:			# SIMPLE: machineFunctionInfo:
	# SIMPLE-NEXT: maxKernArgAlign: 1			# SIMPLE-NEXT: maxKernArgAlign: 1
	# SIMPLE-NEXT: isEntryFunction: true			# SIMPLE-NEXT: isEntryFunction: true
	# SIMPLE-NEXT: argumentInfo:			# SIMPLE-NEXT: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# SIMPLE-NEXT: body:			# SIMPLE-NEXT: body:

	name: empty_mfi_entry_func			name: empty_mfi_entry_func
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	body: \|			body: \|
	bb.0:			bb.0:
	S_ENDPGM 0			S_ENDPGM 0

	...			...

	---			---
	# ALL-LABEL: name: default_regs_mfi			# ALL-LABEL: name: default_regs_mfi

	# FULL: scratchRSrcReg: '$private_rsrc_reg'			# FULL: scratchRSrcReg: '$private_rsrc_reg'
	# FULL-NEXT: scratchWaveOffsetReg: '$scratch_wave_offset_reg'
	# FULL-NEXT: frameOffsetReg: '$fp_reg'			# FULL-NEXT: frameOffsetReg: '$fp_reg'
	# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'			# FULL-NEXT: stackPtrOffsetReg: '$sp_reg'

	# SIMPLE-NOT: scratchRSrcReg			# SIMPLE-NOT: scratchRSrcReg
	# SIMPLE-NOT: scratchWaveOffsetReg
	# SIMPLE-NOT:: stackPtrOffsetReg			# SIMPLE-NOT:: stackPtrOffsetReg
	name: default_regs_mfi			name: default_regs_mfi
	machineFunctionInfo:			machineFunctionInfo:
	scratchRSrcReg: '$private_rsrc_reg'			scratchRSrcReg: '$private_rsrc_reg'

	body: \|			body: \|
	bb.0:			bb.0:
	S_ENDPGM 0			S_ENDPGM 0

	...			...

	---			---
	# ALL-LABEL: name: fake_stack_arginfo			# ALL-LABEL: name: fake_stack_arginfo

	# FULL: argumentInfo:			# FULL: argumentInfo:
	# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# FULL-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# FULL-NEXT: flatScratchInit: { offset: 4 }			# FULL-NEXT: flatScratchInit: { offset: 4 }
	# FULL-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# FULL-NEXT: workItemIDY: { reg: '$vgpr0', mask: 65280 }			# FULL-NEXT: workItemIDY: { reg: '$vgpr0', mask: 65280 }

	# SIMPLE: argumentInfo:			# SIMPLE: argumentInfo:
	# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			# SIMPLE-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	# SIMPLE-NEXT: flatScratchInit: { offset: 4 }			# SIMPLE-NEXT: flatScratchInit: { offset: 4 }
	# SIMPLE-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	# SIMPLE-NEXT: workItemIDY: { reg: '$vgpr0', mask: 65280 }			# SIMPLE-NEXT: workItemIDY: { reg: '$vgpr0', mask: 65280 }
	name: fake_stack_arginfo			name: fake_stack_arginfo
	machineFunctionInfo:			machineFunctionInfo:
	argumentInfo:			argumentInfo:
	flatScratchInit: { offset: 4 }			flatScratchInit: { offset: 4 }
	workItemIDY: { reg: '$vgpr0' , mask: 0xff00 }			workItemIDY: { reg: '$vgpr0' , mask: 0xff00 }

	body: \|			body: \|
	Show All 30 Lines

llvm/test/CodeGen/MIR/AMDGPU/machine-function-info.ll

	Show All 10 Lines
	; CHECK-NEXT: explicitKernArgSize: 128			; CHECK-NEXT: explicitKernArgSize: 128
	; CHECK-NEXT: maxKernArgAlign: 64			; CHECK-NEXT: maxKernArgAlign: 64
	; CHECK-NEXT: ldsSize: 0			; CHECK-NEXT: ldsSize: 0
	; CHECK-NEXT: isEntryFunction: true			; CHECK-NEXT: isEntryFunction: true
	; CHECK-NEXT: noSignedZerosFPMath: false			; CHECK-NEXT: noSignedZerosFPMath: false
	; CHECK-NEXT: memoryBound: false			; CHECK-NEXT: memoryBound: false
	; CHECK-NEXT: waveLimiter: false			; CHECK-NEXT: waveLimiter: false
	; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	; CHECK-NEXT: scratchWaveOffsetReg: '$sgpr101'			; CHECK-NEXT: frameOffsetReg: '$fp_reg'
	; CHECK-NEXT: frameOffsetReg: '$sgpr101'			; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'
	; CHECK-NEXT: stackPtrOffsetReg: '$sgpr101'
	; CHECK-NEXT: argumentInfo:			; CHECK-NEXT: argumentInfo:
	; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	; CHECK-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }			; CHECK-NEXT: kernargSegmentPtr: { reg: '$sgpr4_sgpr5' }
	; CHECK-NEXT: workGroupIDX: { reg: '$sgpr6' }			; CHECK-NEXT: workGroupIDX: { reg: '$sgpr6' }
	; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }			; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr7' }
	; CHECK-NEXT: workItemIDX: { reg: '$vgpr0' }			; CHECK-NEXT: workItemIDX: { reg: '$vgpr0' }
	; CHECK-NEXT: mode:			; CHECK-NEXT: mode:
	; CHECK-NEXT: ieee: true			; CHECK-NEXT: ieee: true
	Show All 15 Lines
	; CHECK-NEXT: explicitKernArgSize: 0			; CHECK-NEXT: explicitKernArgSize: 0
	; CHECK-NEXT: maxKernArgAlign: 1			; CHECK-NEXT: maxKernArgAlign: 1
	; CHECK-NEXT: ldsSize: 0			; CHECK-NEXT: ldsSize: 0
	; CHECK-NEXT: isEntryFunction: true			; CHECK-NEXT: isEntryFunction: true
	; CHECK-NEXT: noSignedZerosFPMath: false			; CHECK-NEXT: noSignedZerosFPMath: false
	; CHECK-NEXT: memoryBound: false			; CHECK-NEXT: memoryBound: false
	; CHECK-NEXT: waveLimiter: false			; CHECK-NEXT: waveLimiter: false
	; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'			; CHECK-NEXT: scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
	; CHECK-NEXT: scratchWaveOffsetReg: '$sgpr101'			; CHECK-NEXT: frameOffsetReg: '$fp_reg'
	; CHECK-NEXT: frameOffsetReg: '$sgpr101'			; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'
	; CHECK-NEXT: stackPtrOffsetReg: '$sgpr101'
	; CHECK-NEXT: argumentInfo:			; CHECK-NEXT: argumentInfo:
	; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr3' }			; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr3' }
	; CHECK-NEXT: implicitBufferPtr: { reg: '$sgpr0_sgpr1' }			; CHECK-NEXT: implicitBufferPtr: { reg: '$sgpr0_sgpr1' }
	; CHECK-NEXT: mode:			; CHECK-NEXT: mode:
	; CHECK-NEXT: ieee: false			; CHECK-NEXT: ieee: false
	; CHECK-NEXT: dx10-clamp: true			; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: false			; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false			; CHECK-NEXT: fp32-output-denormals: false
	Show All 10 Lines
	; CHECK-NEXT: explicitKernArgSize: 0			; CHECK-NEXT: explicitKernArgSize: 0
	; CHECK-NEXT: maxKernArgAlign: 1			; CHECK-NEXT: maxKernArgAlign: 1
	; CHECK-NEXT: ldsSize: 0			; CHECK-NEXT: ldsSize: 0
	; CHECK-NEXT: isEntryFunction: false			; CHECK-NEXT: isEntryFunction: false
	; CHECK-NEXT: noSignedZerosFPMath: false			; CHECK-NEXT: noSignedZerosFPMath: false
	; CHECK-NEXT: memoryBound: false			; CHECK-NEXT: memoryBound: false
	; CHECK-NEXT: waveLimiter: false			; CHECK-NEXT: waveLimiter: false
	; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	; CHECK-NEXT: scratchWaveOffsetReg: '$sgpr33'
	; CHECK-NEXT: frameOffsetReg: '$sgpr34'			; CHECK-NEXT: frameOffsetReg: '$sgpr34'
	; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'			; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'
	; CHECK-NEXT: argumentInfo:			; CHECK-NEXT: argumentInfo:
	; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	; CHECK-NEXT: mode:			; CHECK-NEXT: mode:
	; CHECK-NEXT: ieee: true			; CHECK-NEXT: ieee: true
	; CHECK-NEXT: dx10-clamp: true			; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: false			; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false			; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true			; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true			; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0			; CHECK-NEXT: highBitsOf32BitAddress: 0
	; CHECK-NEXT: body:			; CHECK-NEXT: body:
	define void @function() {			define void @function() {
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}name: function_nsz			; CHECK-LABEL: {{^}}name: function_nsz
	; CHECK: machineFunctionInfo:			; CHECK: machineFunctionInfo:
	; CHECK-NEXT: explicitKernArgSize: 0			; CHECK-NEXT: explicitKernArgSize: 0
	; CHECK-NEXT: maxKernArgAlign: 1			; CHECK-NEXT: maxKernArgAlign: 1
	; CHECK-NEXT: ldsSize: 0			; CHECK-NEXT: ldsSize: 0
	; CHECK-NEXT: isEntryFunction: false			; CHECK-NEXT: isEntryFunction: false
	; CHECK-NEXT: noSignedZerosFPMath: true			; CHECK-NEXT: noSignedZerosFPMath: true
	; CHECK-NEXT: memoryBound: false			; CHECK-NEXT: memoryBound: false
	; CHECK-NEXT: waveLimiter: false			; CHECK-NEXT: waveLimiter: false
	; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			; CHECK-NEXT: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	; CHECK-NEXT: scratchWaveOffsetReg: '$sgpr33'
	; CHECK-NEXT: frameOffsetReg: '$sgpr34'			; CHECK-NEXT: frameOffsetReg: '$sgpr34'
	; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'			; CHECK-NEXT: stackPtrOffsetReg: '$sgpr32'
	; CHECK-NEXT: argumentInfo:			; CHECK-NEXT: argumentInfo:
	; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }			; CHECK-NEXT: privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
	; CHECK-NEXT: privateSegmentWaveByteOffset: { reg: '$sgpr33' }
	; CHECK-NEXT: mode:			; CHECK-NEXT: mode:
	; CHECK-NEXT: ieee: true			; CHECK-NEXT: ieee: true
	; CHECK-NEXT: dx10-clamp: true			; CHECK-NEXT: dx10-clamp: true
	; CHECK-NEXT: fp32-input-denormals: false			; CHECK-NEXT: fp32-input-denormals: false
	; CHECK-NEXT: fp32-output-denormals: false			; CHECK-NEXT: fp32-output-denormals: false
	; CHECK-NEXT: fp64-fp16-input-denormals: true			; CHECK-NEXT: fp64-fp16-input-denormals: true
	; CHECK-NEXT: fp64-fp16-output-denormals: true			; CHECK-NEXT: fp64-fp16-output-denormals: true
	; CHECK-NEXT: highBitsOf32BitAddress: 0			; CHECK-NEXT: highBitsOf32BitAddress: 0
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/test/CodeGen/MIR/AMDGPU/mfi-parse-error-scratch-wave-offset-reg.mir

This file was deleted.

	# RUN: not llc -march=amdgcn -run-pass none -o /dev/null %s 2>&1 \| FileCheck %s
	# CHECK: :7:27: expected a named register
	# CHECK: scratchWaveOffsetReg: ''
	---
	name: empty_scratch_wave_offset_reg
	machineFunctionInfo:
	scratchWaveOffsetReg: ''
	body: \|
	bb.0:

	S_ENDPGM
	...

llvm/test/CodeGen/MIR/AMDGPU/mfi-scratch-wave-offset-reg-class.mir

This file was deleted.

	# RUN: not llc -march=amdgcn -run-pass none -o /dev/null %s 2>&1 \| FileCheck %s
	# CHECK: :8:33: incorrect register class for field
	# CHECK: scratchWaveOffsetReg: '$vgpr0'

	---
	name: wrong_reg_class_scratch_wave_offset_reg
	machineFunctionInfo:
	scratchWaveOffsetReg: '$vgpr0'
	body: \|
	bb.0:

	S_ENDPGM
	...

llvm/test/CodeGen/MIR/AMDGPU/parse-order-reserved-regs.mir

	# RUN: llc -march=amdgcn -run-pass=none -verify-machineinstrs -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -run-pass=none -verify-machineinstrs -o - %s \| FileCheck %s
	# RUN: llc -march=amdgcn -run-pass mir-canonicalizer -verify-machineinstrs -o - %s			# RUN: llc -march=amdgcn -run-pass mir-canonicalizer -verify-machineinstrs -o - %s

	# Previously getReservedRegs was called before parsing			# Previously getReservedRegs was called before parsing
	# machineFunctionInfo, but the AMDGPU implementation depends on			# machineFunctionInfo, but the AMDGPU implementation depends on
	# setting register fields to reserve there. $sgpr50 would then not be			# setting register fields to reserve there. $sgpr50 would then not be
	# reserved, resulting in a verifier error from an undefined register.			# reserved, resulting in a verifier error from an undefined register.

	---			---
	# CHECK: machineFunctionInfo:			# CHECK: machineFunctionInfo:
	# CHECK: isEntryFunction: true			# CHECK: isEntryFunction: true
	# CHECK: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			# CHECK: scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	# CHECK: scratchWaveOffsetReg: '$sgpr50'
	# CHECK: frameOffsetReg: '$sgpr50'			# CHECK: frameOffsetReg: '$sgpr50'
	# CHECK: renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr50, 4, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			# CHECK: renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr50, 4, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	name: reserve_correct_register			name: reserve_correct_register
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: true			isEntryFunction: true
	scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'			scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
	scratchWaveOffsetReg: '$sgpr50'
	frameOffsetReg: '$sgpr50'			frameOffsetReg: '$sgpr50'
	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 4 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:
	renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr50, 4, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)			renamable $vgpr0 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr50, 4, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
	S_ENDPGM 0			S_ENDPGM 0
	...			...

llvm/test/DebugInfo/AMDGPU/variable-locations.ll

	Show All 26 Lines
	; CHECK-NEXT: DW_AT_type			; CHECK-NEXT: DW_AT_type
	; CHECK-NEXT: DW_AT_external			; CHECK-NEXT: DW_AT_external
	; CHECK-NEXT: DW_AT_decl_file			; CHECK-NEXT: DW_AT_decl_file
	; CHECK-NEXT: DW_AT_decl_line			; CHECK-NEXT: DW_AT_decl_line
	; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_addr 0x0)			; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_addr 0x0)
	@GlobB = common addrspace(1) global i32 0, align 4, !dbg !6			@GlobB = common addrspace(1) global i32 0, align 4, !dbg !6

	; CHECK: {{.*}}DW_TAG_subprogram			; CHECK: {{.*}}DW_TAG_subprogram
	; CHECK: DW_AT_frame_base [DW_FORM_block1] (DW_OP_reg{{.*}} SGPR9)			; CHECK-NOT: DW_AT_frame_base

	define amdgpu_kernel void @kernel1(			define amdgpu_kernel void @kernel1(
	; CHECK: {{.*}}DW_TAG_formal_parameter			; CHECK: {{.*}}DW_TAG_formal_parameter
	; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_fbreg +4, DW_OP_lit1, DW_OP_swap, DW_OP_xderef)			; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_fbreg +4, DW_OP_lit1, DW_OP_swap, DW_OP_xderef)
	; CHECK-NEXT: DW_AT_name {{.*}}"ArgN"			; CHECK-NEXT: DW_AT_name {{.*}}"ArgN"
	i32 %ArgN,			i32 %ArgN,
	; CHECK: {{.*}}DW_TAG_formal_parameter			; CHECK: {{.*}}DW_TAG_formal_parameter
	; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_fbreg +8, DW_OP_lit1, DW_OP_swap, DW_OP_xderef)			; CHECK-NEXT: DW_AT_location [DW_FORM_block1] (DW_OP_fbreg +8, DW_OP_lit1, DW_OP_swap, DW_OP_xderef)
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 251459

llvm/docs/AMDGPUUsage.rst

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

llvm/lib/Target/AMDGPU/SIFrameLowering.h

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.td

llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-local.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-private.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-local.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-private.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll

llvm/test/CodeGen/AMDGPU/addrspacecast.ll

llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

llvm/test/CodeGen/AMDGPU/amdhsa-trap-num-sgprs.ll

llvm/test/CodeGen/AMDGPU/array-ptr-calc-i32.ll

llvm/test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll

llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

llvm/test/CodeGen/AMDGPU/call-constant.ll

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/call-waitcnt.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs-fixed-abi.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

llvm/test/CodeGen/AMDGPU/captured-frame-index.ll

llvm/test/CodeGen/AMDGPU/cc-update.ll

llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll

llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

llvm/test/CodeGen/AMDGPU/collapse-endcf.ll

llvm/test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/extload-private.ll

llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll

llvm/test/CodeGen/AMDGPU/fold-fi-mubuf.mir

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

llvm/test/CodeGen/AMDGPU/frame-lowering-entry-all-sgpr-used.mir

llvm/test/CodeGen/AMDGPU/frame-lowering-fp-adjusted.mir

llvm/test/CodeGen/AMDGPU/function-returns.ll

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props.ll

llvm/test/CodeGen/AMDGPU/idot8s.ll

llvm/test/CodeGen/AMDGPU/idot8u.ll

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll

llvm/test/CodeGen/AMDGPU/ipra.ll

llvm/test/CodeGen/AMDGPU/large-alloca-compute.ll

llvm/test/CodeGen/AMDGPU/large-alloca-graphics.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.implicit.buffer.ptr.ll

llvm/test/CodeGen/AMDGPU/load-hi16.ll

llvm/test/CodeGen/AMDGPU/load-lo16.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-load.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-store.ll

llvm/test/CodeGen/AMDGPU/memory_clause.ll

llvm/test/CodeGen/AMDGPU/mesa3d.ll

llvm/test/CodeGen/AMDGPU/mir-print-dead-csr-fi.mir

llvm/test/CodeGen/AMDGPU/misched-killflags.mir

llvm/test/CodeGen/AMDGPU/mubuf-offset-private.ll

llvm/test/CodeGen/AMDGPU/optimize-exec-masking-pre-ra.mir

llvm/test/CodeGen/AMDGPU/partial-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/pei-reg-scavenger-position.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-carry-out.mir

[WIP][AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
ClosedPublic