This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
-
CodeGenPassBuilder.h
-
MachinePassRegistry.def
-
Passes.h
3/3
InitializePasses.h
-
lib/
-
CodeGen/
1/1
CMakeLists.txt
-
CodeGen.cpp
21/27
MachineLateInstrsCleanup.cpp
-
TargetPassConfig.cpp
-
Target/
-
NVPTX/
-
NVPTXTargetMachine.cpp
-
RISCV/
-
RISCVTargetMachine.cpp
-
WebAssembly/
-
WebAssemblyTargetMachine.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
O3-pipeline.ll
-
stack-guard-remat-bitcast.ll
-
sve-calling-convention-mixed.ll
-
AMDGPU/
-
GlobalISel/
-
call-outgoing-stack-args.ll
-
llvm.amdgcn.div.fmas.ll
-
cc-update.ll
-
exec-mask-opt-cannot-create-empty-or-backward-segment.ll
-
flat-scratch.ll
-
llc-pipeline.ll
-
multilevel-break.ll
-
si-annotate-cf.ll
-
si-unify-exit-multiple-unreachables.ll
-
si-unify-exit-return-unreachable.ll
-
spill-offset-calculation.ll
-
spill-scavenge-offset.ll
-
ARM/
-
O3-pipeline.ll
-
arm-shrink-wrapping.ll
-
fpclamptosat.ll
-
ifcvt-branch-weight-bug.ll
-
jump-table-islands.ll
-
reg_sequence.ll
-
BPF/
-
objdump_cond_op_2.ll
-
M68k/
-
pipeline.ll
-
Mips/llvm-ir/
-
llvm-ir/
-
lshr.ll
-
shl.ll
-
PowerPC/
-
O3-pipeline.ll
-
cgp-select.ll
-
fast-isel-branch.ll
-
fp-strict-conv-f128.ll
-
ppcf128-constrained-fp-intrinsics.ll
-
SystemZ/
-
frame-28.mir
-
Thumb/
-
frame-access.ll
-
Thumb2/
-
mve-fpclamptosat_vec.ll
-
X86/
-
2008-04-09-BranchFolding.ll
-
2008-04-16-ReMatBug.ll
-
AMX/
-
amx-across-func.ll
-
amx-spill-merge.ll
-
fast-isel-stackcheck.ll
-
fshl.ll
-
masked_load.ll
-
oddshuffles.ll
-
opt-pipeline.ll
-
sdiv_fix_sat.ll
-
shift-i128.ll
-
ushl_sat_vec.ll
-
vec_extract.ll
-
vec_shift5.ll
-
XCore/
-
scavenging.ll

Differential D123394

[CodeGen] Late cleanup of redundant address/immediate definitions.
ClosedPublic

Authored by jonpa on Apr 8 2022, 8:21 AM.

Download Raw Diff

Details

Summary

On SystemZ, eliminateFrameIndex() may use a scratch register to load part of an offset that is out of range for the using instruction (this is likely also done on other targets - I believe at least AArch64 also does this). Since this is done per instruction (FI operand), several identical addressing "anchor points" may result in the MBB where actually only the first one is needed (provided the GPR is the same). I was expecting this to be cleaned up somehow afterwards, but to my surprise it is not.

I have therefore now made an initial experiment consisting of making PrologEpilogInserter try to clean up redundant definitions after all frame indices have been eliminated, and it would be great to get feedback from other targets to see if this is generally useful. It seems beneficial on SystemZ at least (see below). The patch isn't quite ready yet but I hope it can be the start of a discussion at least. It actually now seems that even though there are "load address" instructions removed, there are in addition many more redundant immediate loads removed (see below). This makes me think that maybe this would better belong in a separate pass run after PEI (or perhaps even later)?

I should also say that I looked into MachineCSE and DeadMachineInstructionElim, but they are at least not readily available: MachineCSE requires SSA, and DeadMachineInstructionElim works bottom-up which does not quite seem to work (the redundant definitions are not "dead").

The patch currently looks for simple "candidate" instructions which may be removed if an identical preceding one exists without any intervening clobbering of the value. The CFG is traversed breadth-first so that live-in definitions can be reused.

Given that the the global analysis adds complexity, I tried also doing this just locally (by passing -pei-localonly). I did find however that the reuse from predecessors is what gives the bulk of improvement, so I would say it is worth that extra effort. On SystemZ/SPEC17:

main <> "local only":
lghi           :               445967               443571    -2396  // Immediate load
lay            :                55090                54783     -307  // Load address
vgbm           :                10974                10848     -126  // Immediate load
...
OPCDIFFS: -2969

main <> patch:
lghi           :               445967               432678   -13289  // Immediate load
lhi            :               219663               216594    -3069  // Immediate load
la             :               533531               531952    -1579  // Load address
lay            :                55090                54774     -316  // Load address
...
OPCDIFFS: -19862

I would say that this is a number of instructions removed (~20k) that makes this look interesting to me... (*) Some examples:

LAYs removed in ./f507.cactuBSSN_r/build/ML_BSSN_Advect.s:

        lay     %r1, 4096(%r15)
        std     %f4, 1024(%r1)
-       lay     %r1, 4096(%r15)
        std     %f1, 1000(%r1)
-       lay     %r1, 4096(%r15)
        std     %f5, 1016(%r1)
-       lay     %r1, 4096(%r15)

LGHI removed in ./f507.cactuBSSN_r/build/RadiationBoundary.s:

        lghi    %r3, 0
        clgijl  %r5, 12, .LBB2_23     // Conditional branch
        risbgn  %r4, %r4, 1, 189, 0
        lghi    %r5, 0
-       lghi    %r3, 0

VGBMs (vector immediate loads) removed in ./f510.parest_r/build/derivative_approximation.s

...
        vst     %v0, 344(%r15), 3
-       vgbm    %v0, 0
        vst     %v0, 360(%r15), 3
-       vgbm    %v0, 0
        vst     %v0, 312(%r15), 3
-       vgbm    %v0, 0
        vst     %v0, 328(%r15), 3
-       vgbm    %v0, 0
        vst     %v0, 280(%r15), 3
-       vgbm    %v0, 0
        vst     %v0, 296(%r15), 3

(*) I also see a few hundred more lmg/br insructions in total, which I am guessing come from some missed CFG optimization, but not sure if this is important. For example:

-       j       .LBB50_33
+       lochie  %r13, 1
+       llgfr   %r2, %r13
+       lmg     %r10, %r15, 240(%r15)
+       br      %r14
 .LBB50_32:
        lghi    %r0, -1
        sllg    %r0, %r0, 0(%r12)
        xihf    %r0, 4294967295
        xilf    %r0, 4294967295
        cg      %r0, 24(%r11)
-.LBB50_33:
-       lhi     %r13, 0
        lochie  %r13, 1
        llgfr   %r2, %r13
        lmg     %r10, %r15, 240(%r15)
        br      %r14

Current status is that benchmarks build with -verify-machineinstrs, but there are many regression tests failing (hopefully due to removed redundant defs :-). I am hoping that people working with those targets can help me understand those...

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

If we're not generating enough code for the giant integers, increasing the bitwidth of the giant integers is the first thing to try, I think.

Thanks! I increased it to i8500, and it now passes and seems to have the same block structure.

Harbormaster completed remote builds in B161428: Diff 425256.Apr 26 2022, 12:08 PM

arsenm added inline comments.Apr 26 2022, 5:14 PM

llvm/lib/CodeGen/PrologEpilogInserter.cpp
135–136 ↗	(On Diff #425193)	DenseMap?
138–139 ↗	(On Diff #425193)	SmallSetVector?
1492–1495 ↗	(On Diff #425256)	Can't you use one of the existing block visiting iterators?
1514 ↗	(On Diff #425256)	If PEI's loop was reversed and scavenge register backwards was used, we wouldn't need kill flags at all
1629 ↗	(On Diff #425256)	*MI works, don't need dump
1643 ↗	(On Diff #425256)	Ditto

jonpa marked an inline comment as done.Apr 27 2022, 2:58 AM

jonpa added inline comments.

llvm/lib/CodeGen/PrologEpilogInserter.cpp
1514 ↗	(On Diff #425256)	Thanks for review. That is a good point, but I have discovered that not only address anchors from Frame Indices lowering that can be found redundant and cleaned, but also many (more) immediate loads. So changing PEI would not handle all cases, unfortunately. I have asked before here if this should be part of PEI or become a separate pass, but so far we have not agreed on anything. Given the above, it seems to me now more reasonable to have this as a separate pass run before MachineCopyPropagation. Does this make sense to you?

arsenm added inline comments.Apr 27 2022, 8:45 AM

llvm/lib/CodeGen/PrologEpilogInserter.cpp
1514 ↗	(On Diff #425256)	I think PEI should be split into a number of separate passes as it is. It might make sense to keep it in PEI if you were to do something ahead of or as part of the frame finalization (which might be better than looking for references to getFrameRegister)

Thank you for review.

I have made some progress but there is yet more to do - any suggestions are welcome.

I think PEI should be split into a number of separate passes as it is. It might make sense to keep it in PEI if you were to do something ahead of or as part of the frame finalization (which might be better than looking for references to getFrameRegister)

I have experimented with putting this into a new pass at different positions in TargetPassConfig::addMachineLateOptimization(). It is possible to maintain NFC by putting this first in that method (which I ended up doing), but I thought it would be interesting to see if it would make any difference putting it after Branchfolder / TailDup. I saw some minor changes only, and these are the "-stats" summarized over SPEC17/SystemZ:

Before Branchfolder <> After Branchfolder:

159682           160143    branch-folder - Number of block tails merged         461  0.3%
561638           560901    branch-folder - Number of branches optimized        -737  0.1%
139108           139050  tailduplication - Number of tail duplicated blocks     -58
107964           108330  tailduplication - Number of tails duplicated           366

After Branchfolder <> Before MachinCopyPropagation (After TailDuplication)

160143           160126    branch-folder - Number of block tails merged         -17
560901           561491    branch-folder - Number of branches optimized         590
139050           138961  tailduplication - Number of tail duplicated blocks     -89
108330           107959  tailduplication - Number of tails duplicated          -371

Before Branchfolder <> Before MachinCopyPropagation (After TailDuplication)

159682           160126    branch-folder - Number of block tails merged         444
561638           561491    branch-folder - Number of branches optimized        -147
139108           138961  tailduplication - Number of tail duplicated blocks    -147
107964           107959  tailduplication - Number of tails duplicated            -5

This doesn't look like making much difference, so I picked the original place for now (with a bias towards more tails dups/less tail merging) .

Herald added subscribers: mattd, gchakrabarti, pmatos and 8 others. · View Herald TranscriptApr 30 2022, 8:19 AM

jonpa added inline comments.Apr 30 2022, 8:21 AM

llvm/lib/CodeGen/PrologEpilogInserter.cpp
1492–1495 ↗	(On Diff #425256)	Using breadth_first() seems to work well (very nearly NFC on SPEC17).

Harbormaster completed remote builds in B162109: Diff 426232.Apr 30 2022, 9:35 AM

The handling of the kill flags have been improved, but they are still updated one at a time. Not sure if there is a better way, or if that would be needed.

Simplified the handling of values from predecessors, but however then got into trouble with DenseMap. It seems I must have been updating it somehow in cases while iterating which is not allowed, so I was scared back to std::map for now (this was not due to the operator[], at least).

I experimented with something like a "register substitution", where a previous def was reused even if the current register is a different one, after changing the register in the users. After sorting out all illegal cases this however seemed to give only a very marginal improvement, so it's not included in the patch. Also tried to not merge predecessor entries, but just handle the case of 1 predecessor, but that did have an impact, so I kept it as it was. So I think at the moment at least this looks to be close to a final version, as I have no more ideas to try.

All tests are updated. It would be nice to get review (it's less than 300 lines) and an agreement on on what could be the next step - perhaps at some point enable it for one or more specific targets?

Harbormaster completed remote builds in B162512: Diff 426771.May 3 2022, 12:47 PM

Also handle cases of invariant loads (GOT, CP) and more constant operand types, such as Global, CPI, and some minor improvements.

Number of removed redundant instructions on SystemZ / SPEC17 is now 21200 (was 19500).

It would be nice to get some numbers for other targets as well!

Harbormaster completed remote builds in B163326: Diff 427871.May 7 2022, 9:56 AM

This is fine from a PPC perspective. What would make more of an impact would be the ability to common partial immediate materializations such as happen in prologue/epilogue when a stack frame is very large. Perhaps this can be extended to do that in the future. It would eliminate things like:

lis 11, 16
ori 11, 11, 256 
stxvx 53, 31, 11                        # 16-byte Folded Spill
lis 11, 16
ori 11, 11, 272 
stxvx 54, 31, 11                        # 16-byte Folded Spill
lis 11, 16
ori 11, 11, 288 
stxvx 55, 31, 11                        # 16-byte Folded Spill
lis 11, 16
ori 11, 11, 304 
stxvx 56, 31, 11                        # 16-byte Folded Spill

Of course, that would involve changing the subsequent ori instructions to addi and would likely be very PPC-specific. I just bring it up here in case SystemZ has the same issue.

In D123394#3499422, @nemanjai wrote:
This is fine from a PPC perspective. What would make more of an impact would be the ability to common partial immediate materializations such as happen in prologue/epilogue when a stack frame is very large. Perhaps this can be extended to do that in the future. It would eliminate things like:
lis 11, 16
ori 11, 11, 256 
stxvx 53, 31, 11                        # 16-byte Folded Spill
lis 11, 16
ori 11, 11, 272 
stxvx 54, 31, 11                        # 16-byte Folded Spill
lis 11, 16
ori 11, 11, 288 
stxvx 55, 31, 11                        # 16-byte Folded Spill
lis 11, 16
ori 11, 11, 304 
stxvx 56, 31, 11                        # 16-byte Folded Spill
Of course, that would involve changing the subsequent ori instructions to addi and would likely be very PPC-specific. I just bring it up here in case SystemZ has the same issue.

That's a good point - my idea so far about that is to try to use a postRA pseudo for e.g. an immediate load requiring two target instructions so that they can be handled just the same way during this pass. Alternatively, one could as you say probably extend this current algorithm to have some kind of "look-ahead" or something so that it could recognize e.g. pairs of instructions as well. That is a bit more complex, though... I will try pseudo on SystemZ and see how it goes with 32-bit immediate loads.

So you did not get any significant results on SPEC with this patch yet?

I checked the compile time and realized that I have to work on that a bit as I see a bit of slowdowns on some files.

The RISC-V changes for this look good to me.

Herald added a subscriber: kosarev. · View Herald TranscriptMay 12 2022, 1:25 AM

NFC updates.

Compile time improved: The regression I saw was remedied with a better iteration strategy over the MBBs. I tried to find further improvements, like tracking kill flags in a map instead of doing the search backwards over instructions, but that was slower. I also tried to use DenseMap for the mapping of Register -> MachineInstr*. Even the best init() alloc with a value of 8 was not better than std::map. IndexedMap did not seem like a good choice given that the typical number of mapped registers are not that many. It looks now like the Wall time for this pass on SPEC/SystemZ is on average 0.6%, much like Machine Copy Propagation Pass #2. There are no big compile time blowups that I am aware of.

I can't find much more to improve, at least not at the moment. This pass is "better than nothing" but it is not impossible that there could be more elegant/powerful solutions like emitting frame address anchors in an intelligent way, cleaning up rematerialized immedate-loads somehow, and maybe other things. Personally, I think this looks pretty good to have as long as there is no better way...

However, if we were to actually go ahead and use this, I think there should be some kind of broader agreement on this. There is no strong SystemZ benefit from this - it was just something that seemed missing (looking at those depressing multiple identical load adress instructions). The most benefit is probably on in-order targets. I think there should at least be two or three targets that "vote for" this. It's good to know that the test changes on e.g. RISC-V look correct, and now the next step would be for those targets to evaluate if it worth adding to the pass pipeline.

@nemanjai My previous idea to try post-RA pseudos for immediates on SystemZ is actually not very useful as it is only 64-bit immediates that require two instructions (very rare). But it still looks like this could be worth trying on e.g. PowerPC per your example above. Maybe you could give that a try (like in FrameLowering emitting a pseduo instruction instead of two actual instructions which is expanded in ExpandPostRAPseudos)? If you think that looks good, it would be great to know that...

Harbormaster completed remote builds in B164724: Diff 429800.May 16 2022, 1:23 PM

ping!

I haven't heard about any other target testing this on benchmarks yet. On SystemZ we see one or two slight improvements on SPEC, but not enough to motivate having this pass in the SystemZ backend. We would however run it if there were other targets as well wanting to do it. So please, take the chance and see if this is anything beneficial for your target!

Herald added a subscriber: jsji. · View Herald TranscriptJun 8 2022, 7:57 AM

Hi @jonpa ,

FWIW, I've made a quick test downstream for our OOT target and can see that this triggers occasionally.

Found one benchmark so far that was improved by almost 2%.
Except for that I've only seen some benchmarks where materialization of zero constants are eliminated. And for our VLIW target such loads of zero often only has a cost in code size as we can bundle them with other instructions, so I did not really see any improvements in cycles when disregarding potential I-cache effects.

Anyway, it doesn't seem to be totally useless for us. I'll see if I can run some more tests to later.

@jonpa: From your perspective, what's the threshold on whether this should be added to the pass pipeline or not? A pass like this that always results in better code (as opposed to sometimes producing better, sometimes worse code) is attractive even if it triggers rarely. But obviously being too eager to add such code will hit compile time.

In D123394#3595772, @asb wrote:

@jonpa: From your perspective, what's the threshold on whether this should be added to the pass pipeline or not? A pass like this that always results in better code (as opposed to sometimes producing better, sometimes worse code) is attractive even if it triggers rarely. But obviously being too eager to add such code will hit compile time.

I think it should be enabled on targets where there is some kind of real performance improvement and compile time is not top-priority. If there are only static code-generation improvements that do not really make a difference in execution time, it seems better to leave it out. On SystemZ, there are just one or two slight performance improvements (~1%), enough to want to use it, I think.

Generally, loads of immediates and address anchors are easily done ahead of time on an OOO machine, so just because it is a nice cleanup doesn't mean it would be worth adding. I am hoping that the other targets would care to try it on benchmarks, and if there are then some more interest we could enable it for those targets.

Ping!

Patch rebased and waiting for evaluation on different targets.

As I have spent a certain amount of time developing this new target independent pass, I would be very happy to get some feedback.

Short summary of what this patch does:

Cleaning up of rematerialized immediates: After ISel, an immediate is typically copied to all it's users from the original def, while regalloc then rematerializes it which causes a lot of identical immediate loads which are currently never cleaned away even if they are close in the same MBB.

Cleaning up of big address offsets forced into registers: SystemZRegisterInfo::eliminateFrameIndex() loads the bigger part of an out of range offset into a register for each FI access, with the idea that these anchors should be reused between accesses. There is however no late "machine cleanup" of identical instructions currently, so there is no reuse at all. This surprised me and was the motivation for this patch, and I think this should be relevant to other targets as well.

It's ~300 LOC, with a small compile time approximately similar to MachineCopyPropagation.

On SystemZ I see ~22k instructions cleaned on SPEC and some slight benchmark improvements which is nice, but perhaps not enough to add a new pass to the backend. I expected this to be done in common code in the first place, and I think this is where it belongs as it is not really target specific.

The question for me now is whether this cleanup should be purposefully omitted (e.g. "doesn't matter enough on OOO CPUs"), or if there is a performance improvement available here. If there is, this patch could be used, or perhaps some other way of achieving this (i.e. earlier in the compilation - any ideas anyone?). If there *isn't*, then that would also be good to know (and perhaps make the comment somewhere in CodeGen)!

I hope you agree that it would be worth at least trying once if this cleanup matters or not. These targets have these number of test files improved: RISCV: 21, AMDGPU: 15, X86: 13, AArch64/ARM/Thumb: 12, PowerPC: 4, Mips: 2, BPF: 1. It would be very nice to learn how much of a cleanup targets achieve in terms of number of instructions eliminated and benchmark improvements. If there is no static improvement on benchmarks (no eliminated instructions) for a target, that would also be interesting to see why this was not even needed.

I am not really strongly arguing for this pass to be added, but I would very much like to come to some consensus on this one way or the other, thanks.

Harbormaster completed remote builds in B185813: Diff 459019.Sep 9 2022, 6:36 AM

foad added a subscriber: foad.Sep 12 2022, 6:31 AM

foad added inline comments.Sep 12 2022, 9:11 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
4710 ↗	(On Diff #459019)	The redundant `s_mov_b32` instructions in saddsat.ll and ssubsat.ll should be fixed by D133702.

My natural bias would be towards accepting a pass that wasn't excessively complicated and demonstrably improved static code quality (especially like this one does, in a way where it's never a regression, even if the benefit may be small or not measurable).

There's a cost in terms of complicating the compiler and adding a new pass, but there's also cost in having generated code that's known to be substandard. Every single time someone invests time in analysing compiler output, it will show up when looking for redundant code and people are likely to waste time figuring out if it's measureable, or perhaps if it's a symptom of some wider codegen problem.

What do others think?

Do you have a sense of what (in practice) is causing the dead operations to get emitted? Can they practically be handled upstream? Such a thing would be a win for all the upstream cost models and would reduce compile time.

I wonder if a version of your pass working as an "assertion" could shake out some obvious missed optimizations or other bugs that cause extraneous instructions to get emitted.

If there are well known things that are impractical to handle (for one reason or another) then something like this would make sense to me.

lkail added a subscriber: lkail.Sep 14 2022, 6:36 PM

I can understand the temptation to commit this patch. I also agree with Chris that it's useful as a diagnostic tool to find deficiencies that should probably be fixed elsewhere. That's what I did with D133702.

Do you have a sense of what (in practice) is causing the dead operations to get emitted?

Speaking only for AMDGPU, I think there are quite a few different reasons. So far I've seen:

Missed optimizations in globalisel (D133702).
PrologEpilogInserter calls TargetRegisterInfo::eliminateFrameIndex which materializes a frame offset into a register, too late for the resulting register-immediate moves to be commoned up.
RegisterCoalescer(?) not doing a good job with register tuples? The problem is that a 64-bit constant like 0x1 is materialized in a register pair like s[0:1], and then 32-bit 0x0 is materialized in s1 -- but s1 was already 0 because it was the high part of the 64-bit constant.
Some problems related to CFG structurization: I suspect that the highly-AMDGPU-specific parts of this like the SIOptimizeExecMasking pass are not doing a good enough job of cleaning up redundant code.

In D123394#3786525, @asb wrote:

There's a cost in terms of complicating the compiler and adding a new pass, but there's also cost in having generated code that's known to be substandard. Every single time someone invests time in analysing compiler output, it will show up when looking for redundant code and people are likely to waste time figuring out if it's measureable, or perhaps if it's a symptom of some wider codegen problem.

Thanks for mentioning this important aspect which I agree very much with, only to fail to put it down into words here.

In D123394#3790449, @lattner wrote:

Do you have a sense of what (in practice) is causing the dead operations to get emitted? Can they practically be handled upstream? Such a thing would be a win for all the upstream cost models and would reduce compile time.

I wonder if a version of your pass working as an "assertion" could shake out some obvious missed optimizations or other bugs that cause extraneous instructions to get emitted.

If there are well known things that are impractical to handle (for one reason or another) then something like this would make sense to me.

I did another round today of trying to understand the causes here while building SPEC on SystemZ. The patch applied as it is removes ~22k instructions. I could see that rematerialization was the cause in some cases so I rebuilt the comparison without it. In the next comparison I in addition also disabled the register coalescer. These disablings were done both on main and the patch when comparing in each step:

main <> patch: ~22k instructions less

main <> patch, while disabling all rematerialization in the backend (returning false in isTriviallyReMaterializable()): ~15k instructions less

main <> patch, while disabling all rematerialization in the backend, and also disabling register coalescing (-join-intervals=false): ~8k instructions less

The register coalescer seems to play a role here as it can take two unrelated but identical immediate loads and make them define the same virtual register. This causes them to end up in the same physreg, and in some cases then the second one can be removed. This looks kind of similar to the results of rematerialization (having two smaller live-ranges instead of one).

For the remaining 8k instruction it seems that many of these are instructions present after isel already. Prior to isel it is an operand of an instruction, e.g. a call operand, and then (in some contexts) target decides to load it into a register. Then there are also on SystemZ the already mentioned duplicated address offsets loads into registers, emitted post-RA, which is different.

It seems to me that these redundant instructions are the effect of the overall heuristic in the compiler of reducing register pressure at the cost of more immediate loads, in all of the above. It's the way it should be, but it wouldn't hurt to clean things up somewhere in the end as much as possible. I don't think that can be done until the final allocation of registers is done (how would one know if there is a clobbering of the reg in between two identical immediate loads?). That's what it looks like to me on SystemZ, at least.

rebase (tests)

Harbormaster completed remote builds in B186873: Diff 460424.Sep 15 2022, 9:52 AM

The X86 tests LGTM (including masked_load.ll and oddshuffles.ll)

RKSimon mentioned this in D132978: [CodeGen] Using ZExt for extractelement indices..Sep 27 2022, 6:08 AM

PING!

Patch updated after rebase (just two AMDGPU tests).

Speaking only for AMDGPU, I think there are quite a few different reasons. So far I've seen:
...

PrologEpilogInserter calls TargetRegisterInfo::eliminateFrameIndex which materializes a frame offset into a register, too late for the resulting register-immediate moves to be commoned up.

...

Yes, this is exactly the same issue as on SystemZ. So far I cannot see any other solution for this - can you?

Harbormaster completed remote builds in B191437: Diff 466712.Oct 11 2022, 1:12 AM

PING! Further comments would be very nice.

Another dozen or so tests updated (RISCV, X86) after rebase (improved).

I know there have been at least some interest in this patch, but so far I am still waiting for a consensus from everybody. Am I the only person who would actually like to see this go in?

X86 LGTM (and I'd like to see this committed as well!)

llvm/include/llvm/InitializePasses.h
265	This looks like it should be sorted?
llvm/lib/CodeGen/CMakeLists.txt
196	sorting

jonpa added inline comments.Oct 27 2022, 6:55 AM

llvm/include/llvm/InitializePasses.h
265	yes... Should the pass maybe better be named something with Machine... to make it clear that it is a late CodeGen pass? Maybe MachineImmLoadsCleanupPass or something?

RKSimon added inline comments.Oct 27 2022, 7:03 AM

llvm/include/llvm/InitializePasses.h
265	SGTM - no strong preference tbh

Harbormaster completed remote builds in B194634: Diff 471134.Oct 27 2022, 7:08 AM

Patch updated per review (sorted order in two files), and also a name change into something hopefully better (it seems good to me that it begins with Machine...).

Thanks for taking a look - I guess then we could enable this for at least SystemZ and X86 to begin with if there are no objections. Other targets that seem to benefit like AMDGPU and RISCV could enable it at some later point if they would rather wait for some reason.

I'm happy to have this enabled for AMDGPU.

Harbormaster completed remote builds in B194737: Diff 471265.Oct 27 2022, 3:00 PM

+1 to fast track AMDGPU/SystemZ/X86 - I don't how you want to handle helping other targets in the longer term (raise bugs? TODO comments?)

In D123394#3895939, @RKSimon wrote:

+1 to fast track AMDGPU/SystemZ/X86 - I don't how you want to handle helping other targets in the longer term (raise bugs? TODO comments?)

So far I have only thought that targets should decide for themselves if they want to run it or not. I hope they will give it a try eventually and either enable it or make a comment for the reasons not to. Perhaps a TODO comment next to the disablePass in the target file saying that this needs to be evaluated before enabling would be nice?

craig.topper added inline comments.Nov 1 2022, 9:45 AM

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
48	Can these be DenseMaps?
53	Every MachineBasicBlock has a number assigned to it. This could possibly be a BitVector using the basic block numbering. See `MachineBasicBlock::getNumber()` and `MachineFunction::getNumBlockIDs` You might also be able to use the numbering instead of a map for MBB2RegDefsMap. But probably depends on how spare that map is.
139	No underscores in variable names. Use VisitedPreds
156	!pred_empty
207	!MBB->pred_empty()?
223	Could this use llvm::all_of with a lambda?
282	Would std::queue be better here?
284	!Worklist.empty()?

LGTM to enable this on RISC-V too, thank you!

This revision is now accepted and ready to land.Nov 1 2022, 9:56 AM

craig.topper added inline comments.Nov 1 2022, 3:38 PM

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
197	Use DefedReg.isValid(). Someday we should remove the implicited casts to unsigned integers from the Register class.

Thanks for review! The pass compilation time percentage has with these improvements decreased on average by 0.1%. I did all the changes except DenseMap (see inline comments below).

Tests updated after rebase:

AMDGPU/si-annotate-cf.ll
RISCV/rvv/fixed-vectors-rint-vp.ll
RISCV/rvv/rint-vp.ll

jonpa added inline comments.Nov 2 2022, 8:09 AM

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
48	I did make some experiments with DenseMap and other alternatives earlier (see May 16 above) and never got any improvements in compile time by doing so. In addition, it currently seems unwise to do that since processBlock ("Clear any entries in map that MI clobbers") is looping over the entries while erasing some of them. I could not do this now even after rewriting the loop like: // Data structures to map regs to their definitions per MBB. - using Reg2DefMap = std::map<Register, MachineInstr>; - using MBB2RegDefsMap = std::map<MachineBasicBlock , Reg2DefMap>; + using Reg2DefMap = DenseMap<Register, MachineInstr>; + using MBB2RegDefsMap = DenseMap<MachineBasicBlock , Reg2DefMap>; MBB2RegDefsMap RegDefs; // Set of visited MBBs. @@ -257,11 +258,10 @@ bool MachineLateInstrsCleanup::processBlock(MachineBasicBlock *MBB) { // Clear any entries in map that MI clobbers. for (auto DefI = MBBDefs.begin(); DefI != MBBDefs.end();) { - Register Reg = DefI->first; + Reg2DefMap::iterator CurrI = DefI++; + Register Reg = CurrI->first; if (MI->modifiesRegister(Reg, TRI)) - DefI = MBBDefs.erase(DefI); - else - ++DefI; + MBBDefs.erase(CurrI); } I then got "Assertion `isHandleInSync() && "invalid iterator access!"' failed.", so it seems to me that this isn't supported.
53	Using BitVector for Visited and Visited_preds seems to lower the average compile time (on the files it shows up) by about ~0.04%, so it looks like a very slight but noticeable improvement. For the MBB2RegDefsMap I saw a similar slight further improvement on average, so with both of these changes an average improvement of 0.1% seems to result. It may be that with this change (MBB2RegDefsMap) some of the slower cases get slower, but the average is improved. I have only tested this on SystemZ, but I guess a BitMap really should be faster than a map, so it makes sense to use it to me.
223	yeah
282	It's 0.01% faster on average, so why not :-)

Harbormaster completed remote builds in B195706: Diff 472617.Nov 2 2022, 9:07 AM

@craig.topper : LGTY?

danielkiss accepted this revision.Nov 7 2022, 2:30 AM

LGTM to enable this on Arm/AArch64.

iiiyours added a subscriber: iiiyours.Nov 15 2022, 11:36 PM

Ping!

Now that many targets are willing to try this, it would be nice to have it properly reviewed as well before I commit it.

From what I understand nearly all targets with updated tests except BPF, Mips and XCore have said they would like to have it enabled. Not sure if I should disable those targets...

@sdardis Any thoughts on the MIPS changes?

craig.topper added inline comments.Nov 17 2022, 10:06 AM

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
112	Can we use RPOT(reverse post order traversal)? That will guarantee this property except for loops.
165	I think you can use `getMF()` instead of getParent()->getParent()

rnk resigned from this revision.Nov 17 2022, 11:35 AM

Updated per review. Using RPOT was a nice improvement - see inline comment. A few more improvements in tests also resulted from this.

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
112	Yes - awesome! 55 lines of iteration logic removed, about the same amount of success (even very slightly better on SPEC), and improved compile time (ave 0.57% -> 0.50%).

Updated tests were:

llvm/test/CodeGen/AMDGPU/exec-mask-opt-cannot-create-empty-or-backward-segment.ll |  2 --
llvm/test/CodeGen/RISCV/rvv/bitreverse-sdnode.ll                                  |  2 +-
llvm/test/CodeGen/RISCV/rvv/bswap-sdnode.ll                                       |  1 -
llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll                                           |  6 ------
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap-vp.ll                             |  4 ----
llvm/test/CodeGen/SystemZ/frame-28.mir                                            |  5 +++--

Harbormaster completed remote builds in B198303: Diff 476249.Nov 17 2022, 4:55 PM

In Hexagon we try to find a pre-existing register that we can reuse:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/Hexagon/HexagonRegisterInfo.cpp#L286-L333

In D123394#3938078, @kparzysz wrote:

In Hexagon we try to find a pre-existing register that we can reuse:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/Hexagon/HexagonRegisterInfo.cpp#L286-L333

That's done after regalloc in eliminateFrameIndex - interesting. My thoughts on this have been that it is more efficient to scan the entire function once rather than for each single frame index elimination. That should be possible if the used register is the same for consecutive FI operands, which I think it should typically be. It would be interesting to see what would happen if you removed that code and instead enabled this pass.

The other thing is that the bulk of the instructions being cleaned up seems to be loads of immediates which result from rematerialization rather than FIs.

Rebase.

This time the test CodeGen/AMDGPU/flat-scratch.ll failed with the verifier as a kill flag of a super register was not removed. To remedy this, I changed clearKillsForDef() to check for any overlapping reg instead of just the identical one.

Updated tests:

AMDGPU/GlobalISel/call-outgoing-stack-args.ll
AMDGPU/GlobalISel/flat-scratch.ll
AMDGPU/chain-hi-to-lo.ll
AMDGPU/flat-scratch.ll
AMDGPU/si-annotate-cf.ll
AMDGPU/spill-offset-calculation.ll
AMDGPU/spill-scavenge-offset.ll

Harbormaster completed remote builds in B198854: Diff 476992.Nov 21 2022, 10:03 PM

foad added inline comments.Nov 21 2022, 11:16 PM

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
183	Could use structured bindings: `for (auto [Reg, DefMI] : ...)`
186	Could use `all_of(drop_begin(MBB->predecessors()), ...)`?
194–195	Use `<< printMBBReference(*MBB) <<` instead of rolling your own MBB#number syntax, here and elsewhere.
204–205	This looks like `for (auto &MI : make_early_inc_range(MBB)))`
207	Comment should explain why rather than what.

Thanks (again) for nice review suggestions. Patch updated (NFC, no new test updates).

Harbormaster completed remote builds in B198982: Diff 477189.Nov 22 2022, 8:30 AM

Ping!

Patch rebased, with one test updated: AMDGPU/GlobalISel/call-outgoing-stack-args.ll.

Still waiting for a final approval...

LGTM - @foad @craig.topper any additional feedback?

In D123394#3957323, @RKSimon wrote:

LGTM - @foad @craig.topper any additional feedback?

Still seems fine to me - but I still think we should also try to clean up some of these redundant instructions earlier, or avoid generating them in the first place, if possible.

Harbormaster completed remote builds in B200019: Diff 478579.Nov 29 2022, 8:56 AM

LGTM - cheers!

This revision is now accepted and ready to land.Nov 30 2022, 4:34 AM

This revision was landed with ongoing or failed builds.Dec 1 2022, 10:22 AM

Closed by commit rG6d12599fd413: [CodeGen] Add new pass for late cleanup of redundant definitions. (authored by jonpa). · Explain Why

This revision was automatically updated to reflect the committed changes.

jonpa added a commit: rG6d12599fd413: [CodeGen] Add new pass for late cleanup of redundant definitions..

jonpa added a reverting change: rG8ef463268122: Revert "[CodeGen] Add new pass for late cleanup of redundant definitions.".Dec 1 2022, 10:30 AM

This looks very interesting. I've been thinking about ways to improve stack addressing sequences on RISCV, and this nicely tackles many of the cases I was just thinking about.

Unfortunately, I think the patch as landed is wrong. See the inline comment on an example where we can't forward due to a clobber which doesn't appear to be respected.

llvm/test/CodeGen/RISCV/rvv/rv64-spill-zvlsseg.ll
44 ↗	(On Diff #479348)	This line looks to be wrong. a0 is clobbered on line 40 which is after the previous definition being forwarded from line 37.

Patch reverted because of build failure, relating to capturing structured bindings, which builds fine with gcc but not yet with clang.

I added init captures in processBlock(), which should remedy the build problems and also allow easy removal once it is fully accepted by clang.

for (auto [Reg, DefMI] : RegDefs[FirstPred->getNumber()])
  if (llvm::all_of(
       drop_begin(MBB->predecessors()),
       **[&, &Reg = Reg, &DefMI **= DefMI](const MachineBasicBlock *Pred) {
       ...

@reames: Thanks for catching that problem that was previously missed. I looked into it and found:

$x10 = ADDI $x2, 16
$x11 = PseudoReadVLENB
PseudoVSPILL2_M1 killed renamable $v8_v9, killed $x10, killed $x11 :: (store unknown-size into %stack.0, align 8)
INLINEASM &"" [sideeffect] [attdialect], $0:[clobber] ...
$x10 = ADDI $x2, 16
$x11 = PseudoReadVLENB
renamable $v7_v8 = PseudoVRELOAD2_M1 killed $x10, killed $x11 :: (load unknown-size from %stack.0, align 8)

After Machine Late Instructions Cleanup Pass (machine-latecleanup)

$x10 = ADDI $x2, 16
 $x11 = PseudoReadVLENB
 PseudoVSPILL2_M1 killed renamable $v8_v9, $x10, $x11 :: (store unknown-size into %stack.0, align 8)
 INLINEASM &"" [sideeffect] [attdialect], $0:[clobber], ...
 renamable $v7_v8 = PseudoVRELOAD2_M1 killed $x10, killed $x11 :: (load unknown-size from %stack.0, align 8)

After RISCV pseudo instruction expansion pass (riscv-expand-pseudo)

$x10 = ADDI $x2, 16
$x11 = PseudoReadVLENB
VS1R_V $v8, $x10, implicit $v8_v9 :: (store unknown-size into %stack.0, align 8)
$x10 = ADD $x10, $x11
VS1R_V $v9, $x10, implicit $v8_v9 :: (store unknown-size into %stack.0, align 8)
INLINEASM &"" [sideeffect] [attdialect], $0:[clobber], ...
$v7 = VL1RE8_V $x10 :: (load unknown-size from %stack.0, align 8)
$x10 = ADD $x10, $x11
$v8 = VL1RE8_V $x10 :: (load unknown-size from %stack.0, align 8)

The problem here is that $x10 is only marked as a use operand in the pseudo instruction (PseudoVSPILL2_M1) when the cleanup pass sees it, but it is in fact later getting expanded to a sequence that actually clobbers the register. I think the fix should be to make sure any pseudo instruction is properly modeled to reflect the final expansion, in this case with a def operand of the base register.

Could I disable the pass for now for RISCV and recommit?

jonpa reopened this revision.Dec 1 2022, 1:06 PM

This revision is now accepted and ready to land.Dec 1 2022, 1:06 PM

jonpa requested review of this revision.Dec 1 2022, 1:07 PM

jonpa added a reviewer: reames.

In D123394#3964717, @jonpa wrote:

@reames: Thanks for catching that problem that was previously missed. I looked into it and found:
...
Could I disable the pass for now for RISCV and recommit?

If you're sure this is a RISCV specific problem, sure. Your description does sound like one to me, so it basically comes down to how sure of your analysis you are.

In D123394#3964960, @reames wrote:

In D123394#3964717, @jonpa wrote:

@reames: Thanks for catching that problem that was previously missed. I looked into it and found:
...
Could I disable the pass for now for RISCV and recommit?

If you're sure this is a RISCV specific problem, sure. Your description does sound like one to me, so it basically comes down to how sure of your analysis you are.

Looks like I tried to fix the RISCV issue once. https://reviews.llvm.org/D109405 but it caused other problems and got reverted. Then I guess I lost track of it.

OK - I will recommit this then as it does seem like a problem in the RISCV backend. I'll wait a day or so to let other targets check that they don't have this potential problem...

Harbormaster completed remote builds in B200601: Diff 479392.Dec 1 2022, 4:44 PM

craig.topper mentioned this in D139169: [RISCV][WIP] Move VSPILL/VRELOAD expansion for vector tuples to eliminateFrameIndex..Dec 1 2022, 9:33 PM

Recommitted as 17db0de, with RISCV disabling it for now.

Posting the RISCV part of the patch here for future use when you want to enable it.

Herald added a subscriber: arichardson. · View Herald TranscriptDec 3 2022, 12:47 PM

This revision is now accepted and ready to land.Dec 3 2022, 12:47 PM

Harbormaster completed remote builds in B200933: Diff 479862.Dec 3 2022, 1:51 PM

MaskRay mentioned this in rGf55880e830e1: [test] Fix CodeGen/M68k/pipeline.ll after D123394 MachineLateInstrsCleanupPass.Dec 4 2022, 11:08 AM

Reverted again with 122efef.

Hi, it's useful to include Differential Revision: https://reviews.llvm.org/D123394 for all reland commits so that others can easily tell the state by visiting this page.
For relands, please include the original description (don't let readers trace through the original commit to get the description).

MaskRay added inline comments.Dec 4 2022, 4:47 PM

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
87	This can be moved closer to `Changed \|= processBlock(MBB);`
113	`--I` https://llvm.org/docs/CodingStandards.html#prefer-preincrement
177	Don't add unneeded blank lines between variable definitions or a variable definition. If the next statement has a comment, it is fine to keep a blank line before the comment. Applies to elsewhere in the patch.
203	This condition is somewhat strange.

mgorny mentioned this in rGe99edb92356b: Revert "[test] Fix CodeGen/M68k/pipeline.ll after D123394….Dec 5 2022, 7:47 AM

@MaskRay, I've reverted your M68k test fix since this change was reverted too, to get M68k tests to pass again.

@jonpa, can you please include the change from f55880e830e150d98e5340cdc3c4c41867a5514d in your patch?

This revision was landed with ongoing or failed builds.Dec 5 2022, 10:55 AM

Closed by commit rG5ecd36329508: Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." (authored by jonpa). · Explain Why

This revision was automatically updated to reflect the committed changes.

jonpa marked 4 inline comments as done.

jonpa added a commit: rG5ecd36329508: Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.".

Latest patch as committed (did not update correctly automatically).

Harbormaster completed remote builds in B201161: Diff 480166.Dec 5 2022, 11:04 AM

uabelho added a subscriber: uabelho.Dec 6 2022, 1:59 AM

craig.topper mentioned this in rG8d30b9e64f7e: [RISCV] Move VSPILL/VRELOAD expansion for vector tuples to eliminateFrameIndex..Dec 6 2022, 3:42 PM

uabelho added inline comments.Dec 13 2022, 2:02 AM

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
106–110	Hi @jonpa It looks like clearKillsForDef doesn't really handle bundled input. E.g. for our OOT target I've seen cases where "kill" flags were cleared in the BUNDLE instruction, but not in the individual bundled instructions which then lead to verifier complaints. Do you know if there are other parts of the pass that have problems with bundled input?

uabelho added inline comments.Dec 14 2022, 12:13 AM

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp
114–123	I wonder if this won't abort too early? Or are we really guaranteed to find an implicit kill of the super-reg within this instruction if we find a kill of a sub-reg? At least for my target I've found cases like DEF superreg USE killed subreg1 USE killed subreg2 DEF superreg USE killed subreg1 and with the current implementation we get DEF super reg USE killed subreg1 USE subreg2 USE killed subreg1 and then the verifier complains on the second use of subreg1.

Hi @uabelho , I'm glad you may find some use for this pass as well :)

It looks like clearKillsForDef doesn't really handle bundled input. E.g. for our OOT target I've seen cases where "kill" flags were cleared in the BUNDLE instruction, but not in the individual bundled instructions which then lead to verifier complaints.

Do you know if there are other parts of the pass that have problems with bundled input?

TBH, it was a while since I thought about BUNDLEs, but IIRC it should be possible to use an instr_iterator with NFC in the *absence* of BUNDLEs, so perhaps that would be a somewhat trivial change..?
Maybe @kparzysz would be interested in this as well, as I believe Hexagon is also emitting BUNDLEs..?

Not quite sure how BUNDLEs need to be handled, for instance when a bundled instruction is deleted, I guess the BUNDLE operands must also be updated.

I wonder if this won't abort too early? Or are we really guaranteed to find an implicit kill of the super-reg within this instruction if we find a kill of a sub-reg?

Are you returning true from enableSubRegLiveness()? I suspect that could cause a difference that I haven't yet run into myself.

As I have mentioned before, I still think the most valuable thing here would be to consider dropping kill flags entirely since this is run so late. I don't think e.g. the buildSchedGraph() uses them. Perhaps they could be regenerated quickly when they are actually useful. The benefits here would be to not have to worry about this tedious updating, which also costs potentially in compile time. Alternatively, it seems that this has to be fixed here for the case of enabled subreg liveness. I have at least so far found it sufficient with the current search for kill flags (without subreg liveness enabled).

In D123394#3996342, @jonpa wrote:

I wonder if this won't abort too early? Or are we really guaranteed to find an implicit kill of the super-reg within this instruction if we find a kill of a sub-reg?

Are you returning true from enableSubRegLiveness()? I suspect that could cause a difference that I haven't yet run into myself.

As I have mentioned before, I still think the most valuable thing here would be to consider dropping kill flags entirely since this is run so late. I don't think e.g. the buildSchedGraph() uses them. Perhaps they could be regenerated quickly when they are actually useful. The benefits here would be to not have to worry about this tedious updating, which also costs potentially in compile time. Alternatively, it seems that this has to be fixed here for the case of enabled subreg liveness. I have at least so far found it sufficient with the current search for kill flags (without subreg liveness enabled).

We haven't enabled subreg liveness. What we do have though is a downstream hack in VirtRegMap.cpp that throws away implicit uses of superregisters since we haven't understood what they are good for and they just limit scheduling later.
But ok, that explains why those implicit killed uses are missing for us :) I'm still not sure if they are always guaranteed to be there though. Do you know if this is documented somewhere?

Right now, downstream I just made clearKillsForDef continue searhing until it finds a def, and not do the early abort you can do due to the implicit superreg kill. (no idea what compiletime impact this has.)

We haven't enabled subreg liveness. What we do have though is a downstream hack in VirtRegMap.cpp that throws away implicit uses of superregisters since we haven't understood what they are good for and they just limit scheduling later.

Ah - yeah have some vague notion of that :-)

But ok, that explains why those implicit killed uses are missing for us :) I'm still not sure if they are always guaranteed to be there though. Do you know if this is documented somewhere?

Not really - the idea was basically to stop the search backwards also when a use of the reg is found as there is no point in continuing. I then encountered a problem in a case of the extra super-reg operand, and added a handling for that. I guess I was then still assuming that there can't be an earlier kill of the register. Are you seeing a problem here also without the removal of the superregisters-hack you are using? Something like a def of a superregister, and then a kill of a subreg, in which case the search should continue for the other subreg kill?

Right now, downstream I just made clearKillsForDef continue searhing until it finds a def, and not do the early abort you can do due to the implicit superreg kill. (no idea what compiletime impact this has.)

Makes sense to me - I tried also earlier to store the kills in a map but I found (at least on SystemZ) that this backwards search was actually quicker than that, so I don't think it should be an issue in the normal case...

Makes sense to me - I tried also earlier to store the kills in a map but I found (at least on SystemZ) that this backwards search was actually quicker than that, so I don't think it should be an issue in the normal case...

Hello Jonas, just FYI I got case:

Total Execution Time: 670.4078 seconds (670.7149 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  289.2998 ( 43.7%)   0.1439 (  1.8%)  289.4437 ( 43.2%)  289.5775 ( 43.2%)  Structurize control flow
  124.9852 ( 18.9%)   0.0000 (  0.0%)  124.9852 ( 18.6%)  125.0289 ( 18.6%)  Machine Late Instructions Cleanup Pass
  45.2408 (  6.8%)   0.0000 (  0.0%)  45.2408 (  6.7%)  45.2740 (  6.8%)  SI Form memory clauses
  38.0358 (  5.7%)   0.0240 (  0.3%)  38.0598 (  5.7%)  38.0767 (  5.7%)  Simple Register Coalescing

Most of the time is spent in clearKillsForDef.

Herald added a subscriber: luke. · View Herald TranscriptMar 11 2023, 3:02 AM

In D123394#4186531, @vpykhtin wrote:
Makes sense to me - I tried also earlier to store the kills in a map but I found (at least on SystemZ) that this backwards search was actually quicker than that, so I don't think it should be an issue in the normal case...

Hello Jonas, just FYI I got case:
Total Execution Time: 670.4078 seconds (670.7149 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  289.2998 ( 43.7%)   0.1439 (  1.8%)  289.4437 ( 43.2%)  289.5775 ( 43.2%)  Structurize control flow
  124.9852 ( 18.9%)   0.0000 (  0.0%)  124.9852 ( 18.6%)  125.0289 ( 18.6%)  Machine Late Instructions Cleanup Pass
  45.2408 (  6.8%)   0.0000 (  0.0%)  45.2408 (  6.7%)  45.2740 (  6.8%)  SI Form memory clauses
  38.0358 (  5.7%)   0.0240 (  0.3%)  38.0598 (  5.7%)  38.0767 (  5.7%)  Simple Register Coalescing
Most of the time is spent in clearKillsForDef.

Hi Valery, thanks for the report! It would be nice if you posted your test case on github issues and assigned me (and/or yourself :)... I would think this could be fixed.

Hi Valery, thanks for the report! It would be nice if you posted your test case on github issues and assigned me (and/or yourself :)... I would think this could be fixed.

Sorry for delay, done https://github.com/llvm/llvm-project/issues/61397. Maybe if you have the version with kill flag maps we can check if it works better on the testcase.

In D123394#4191917, @vpykhtin wrote:

Hi Valery, thanks for the report! It would be nice if you posted your test case on github issues and assigned me (and/or yourself :)... I would think this could be fixed.

Sorry for delay, done https://github.com/llvm/llvm-project/issues/61397. Maybe if you have the version with kill flag maps we can check if it works better on the testcase.

Indeed, the kill flags map does seem to remedy this huge test case. My results and patch posted on github issues.

Anyone interested in compile time with this pass with huge functions, please feel free to contribute your opinion here.

nickdesaulniers mentioned this in D149191: [CodeGen][MachineLastInstrsCleanup] fix INLINEASM_BR hazard.Apr 25 2023, 2:09 PM

nickdesaulniers mentioned this in rG012ea747ed02: [CodeGen][MachineLastInstrsCleanup] fix INLINEASM_BR hazard.Apr 27 2023, 1:46 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

CodeGenPassBuilder.h

3 lines

MachinePassRegistry.def

1 line

Passes.h

4 lines

InitializePasses.h

1 line

lib/

CodeGen/

CMakeLists.txt

1 line

CodeGen.cpp

1 line

MachineLateInstrsCleanup.cpp

239 lines

TargetPassConfig.cpp

3 lines

Target/

NVPTX/

NVPTXTargetMachine.cpp

1 line

RISCV/

RISCVTargetMachine.cpp

4 lines

WebAssembly/

WebAssemblyTargetMachine.cpp

1 line

test/

CodeGen/

AArch64/

O3-pipeline.ll

1 line

stack-guard-remat-bitcast.ll

7 lines

sve-calling-convention-mixed.ll

13 lines

AMDGPU/

GlobalISel/

call-outgoing-stack-args.ll

8 lines

llvm.amdgcn.div.fmas.ll

1 line

cc-update.ll

5 lines

exec-mask-opt-cannot-create-empty-or-backward-segment.ll

2 lines

1398 lines

4 lines

1 line

6 lines

si-unify-exit-multiple-unreachables.ll

2 lines

si-unify-exit-return-unreachable.ll

2 lines

spill-offset-calculation.ll

7 lines

spill-scavenge-offset.ll

4 lines

ARM/

O3-pipeline.ll

1 line

arm-shrink-wrapping.ll

2 lines

fpclamptosat.ll

2 lines

ifcvt-branch-weight-bug.ll

2 lines

jump-table-islands.ll

2 lines

reg_sequence.ll

1 line

BPF/

objdump_cond_op_2.ll

9 lines

M68k/

pipeline.ll

1 line

Mips/

llvm-ir/

lshr.ll

1 line

shl.ll

1 line

PowerPC/

O3-pipeline.ll

1 line

cgp-select.ll

1 line

fast-isel-branch.ll

2 lines

fp-strict-conv-f128.ll

2 lines

ppcf128-constrained-fp-intrinsics.ll

2 lines

SystemZ/

frame-28.mir

327 lines

Thumb/

frame-access.ll

25 lines

Thumb2/

mve-fpclamptosat_vec.ll

3 lines

X86/

2008-04-09-BranchFolding.ll

1 line

2008-04-16-ReMatBug.ll

1 line

AMX/

amx-across-func.ll

1 line

amx-spill-merge.ll

2 lines

fast-isel-stackcheck.ll

1 line

20 lines

5 lines

1 line

1 line

1 line

2 lines

1 line

2 lines

4 lines

XCore/

scavenging.ll

3 lines

Diff 480166

llvm/include/llvm/CodeGen/CodeGenPassBuilder.h

Show First 20 Lines • Show All 1,125 Lines • ▼ Show 20 Lines	void CodeGenPassBuilder<Derived>::addMachineLateOptimization(

// Tail duplication.		// Tail duplication.
// Note that duplicating tail just increases code size and degrades		// Note that duplicating tail just increases code size and degrades
// performance for targets that require Structured Control Flow.		// performance for targets that require Structured Control Flow.
// In addition it can also make CFG irreducible. Thus we disable it.		// In addition it can also make CFG irreducible. Thus we disable it.
if (!TM.requiresStructuredCFG())		if (!TM.requiresStructuredCFG())
addPass(TailDuplicatePass());		addPass(TailDuplicatePass());

		// Cleanup of redundant (identical) address/immediate loads.
		addPass(MachineLateInstrsCleanupPass());

// Copy propagation.		// Copy propagation.
addPass(MachineCopyPropagationPass());		addPass(MachineCopyPropagationPass());
}		}

/// Add standard basic block placement passes.		/// Add standard basic block placement passes.
template <typename Derived>		template <typename Derived>
void CodeGenPassBuilder<Derived>::addBlockPlacement(		void CodeGenPassBuilder<Derived>::addBlockPlacement(
AddMachinePass &addPass) const {		AddMachinePass &addPass) const {
Show All 9 Lines

llvm/include/llvm/CodeGen/MachinePassRegistry.def

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	DUMMY_MACHINE_FUNCTION_PASS("localstackalloc", LocalStackSlotPass, ())			DUMMY_MACHINE_FUNCTION_PASS("localstackalloc", LocalStackSlotPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("shrink-wrap", ShrinkWrapPass, ())			DUMMY_MACHINE_FUNCTION_PASS("shrink-wrap", ShrinkWrapPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("prologepilog", PrologEpilogInserterPass, ())			DUMMY_MACHINE_FUNCTION_PASS("prologepilog", PrologEpilogInserterPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("postrapseudos", ExpandPostRAPseudosPass, ())			DUMMY_MACHINE_FUNCTION_PASS("postrapseudos", ExpandPostRAPseudosPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("implicit-null-checks", ImplicitNullChecksPass, ())			DUMMY_MACHINE_FUNCTION_PASS("implicit-null-checks", ImplicitNullChecksPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass, ())			DUMMY_MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("machine-scheduler", MachineSchedulerPass, ())			DUMMY_MACHINE_FUNCTION_PASS("machine-scheduler", MachineSchedulerPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass, ())			DUMMY_MACHINE_FUNCTION_PASS("machine-cp", MachineCopyPropagationPass, ())
				DUMMY_MACHINE_FUNCTION_PASS("machine-latecleanup", MachineLateInstrsCleanupPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass, ())			DUMMY_MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("fentry-insert", FEntryInserterPass, ())			DUMMY_MACHINE_FUNCTION_PASS("fentry-insert", FEntryInserterPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("xray-instrumentation", XRayInstrumentationPass, ())			DUMMY_MACHINE_FUNCTION_PASS("xray-instrumentation", XRayInstrumentationPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("patchable-function", PatchableFunctionPass, ())			DUMMY_MACHINE_FUNCTION_PASS("patchable-function", PatchableFunctionPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("reg-usage-propagation", RegUsageInfoPropagationPass, ())			DUMMY_MACHINE_FUNCTION_PASS("reg-usage-propagation", RegUsageInfoPropagationPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("reg-usage-collector", RegUsageInfoCollectorPass, ())			DUMMY_MACHINE_FUNCTION_PASS("reg-usage-collector", RegUsageInfoCollectorPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("funclet-layout", FuncletLayoutPass, ())			DUMMY_MACHINE_FUNCTION_PASS("funclet-layout", FuncletLayoutPass, ())
	DUMMY_MACHINE_FUNCTION_PASS("stackmap-liveness", StackMapLivenessPass, ())			DUMMY_MACHINE_FUNCTION_PASS("stackmap-liveness", StackMapLivenessPass, ())
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines	namespace llvm {
extern char &MachineSinkingID;		extern char &MachineSinkingID;

/// MachineCopyPropagation - This pass performs copy propagation on		/// MachineCopyPropagation - This pass performs copy propagation on
/// machine instructions.		/// machine instructions.
extern char &MachineCopyPropagationID;		extern char &MachineCopyPropagationID;

MachineFunctionPass *createMachineCopyPropagationPass(bool UseCopyInstr);		MachineFunctionPass *createMachineCopyPropagationPass(bool UseCopyInstr);

		/// MachineLateInstrsCleanup - This pass removes redundant identical
		/// instructions after register allocation and rematerialization.
		extern char &MachineLateInstrsCleanupID;

/// PeepholeOptimizer - This pass performs peephole optimizations -		/// PeepholeOptimizer - This pass performs peephole optimizations -
/// like extension and comparison eliminations.		/// like extension and comparison eliminations.
extern char &PeepholeOptimizerID;		extern char &PeepholeOptimizerID;

/// OptimizePHIs - This pass optimizes machine instruction PHIs		/// OptimizePHIs - This pass optimizes machine instruction PHIs
/// to take advantage of opportunities created during DAG legalization.		/// to take advantage of opportunities created during DAG legalization.
extern char &OptimizePHIsID;		extern char &OptimizePHIsID;

▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	void initializeMachineBlockFrequencyInfoPass(PassRegistry&);			void initializeMachineBlockFrequencyInfoPass(PassRegistry&);
	void initializeMachineBlockPlacementPass(PassRegistry&);			void initializeMachineBlockPlacementPass(PassRegistry&);
	void initializeMachineBlockPlacementStatsPass(PassRegistry&);			void initializeMachineBlockPlacementStatsPass(PassRegistry&);
	void initializeMachineBranchProbabilityInfoPass(PassRegistry&);			void initializeMachineBranchProbabilityInfoPass(PassRegistry&);
	void initializeMachineCFGPrinterPass(PassRegistry &);			void initializeMachineCFGPrinterPass(PassRegistry &);
	void initializeMachineCSEPass(PassRegistry&);			void initializeMachineCSEPass(PassRegistry&);
	void initializeMachineCombinerPass(PassRegistry&);			void initializeMachineCombinerPass(PassRegistry&);
	void initializeMachineCopyPropagationPass(PassRegistry&);			void initializeMachineCopyPropagationPass(PassRegistry&);
	void initializeMachineCycleInfoPrinterPassPass(PassRegistry &);			void initializeMachineCycleInfoPrinterPassPass(PassRegistry &);
				RKSimonUnsubmitted Done Reply Inline Actions This looks like it should be sorted? RKSimon: This looks like it should be sorted?
				jonpaAuthorUnsubmitted Done Reply Inline Actions yes... Should the pass maybe better be named something with Machine... to make it clear that it is a late CodeGen pass? Maybe MachineImmLoadsCleanupPass or something? jonpa: yes... Should the pass maybe better be named something with Machine... to make it clear that it…
				RKSimonUnsubmitted Done Reply Inline Actions SGTM - no strong preference tbh RKSimon: SGTM - no strong preference tbh
	void initializeMachineCycleInfoWrapperPassPass(PassRegistry &);			void initializeMachineCycleInfoWrapperPassPass(PassRegistry &);
	void initializeMachineDominanceFrontierPass(PassRegistry&);			void initializeMachineDominanceFrontierPass(PassRegistry&);
	void initializeMachineDominatorTreePass(PassRegistry&);			void initializeMachineDominatorTreePass(PassRegistry&);
	void initializeMachineFunctionPrinterPassPass(PassRegistry&);			void initializeMachineFunctionPrinterPassPass(PassRegistry&);
	void initializeMachineFunctionSplitterPass(PassRegistry &);			void initializeMachineFunctionSplitterPass(PassRegistry &);
				void initializeMachineLateInstrsCleanupPass(PassRegistry&);
	void initializeMachineLICMPass(PassRegistry&);			void initializeMachineLICMPass(PassRegistry&);
	void initializeMachineLoopInfoPass(PassRegistry&);			void initializeMachineLoopInfoPass(PassRegistry&);
	void initializeMachineModuleInfoWrapperPassPass(PassRegistry &);			void initializeMachineModuleInfoWrapperPassPass(PassRegistry &);
	void initializeMachineOptimizationRemarkEmitterPassPass(PassRegistry&);			void initializeMachineOptimizationRemarkEmitterPassPass(PassRegistry&);
	void initializeMachineOutlinerPass(PassRegistry&);			void initializeMachineOutlinerPass(PassRegistry&);
	void initializeMachinePipelinerPass(PassRegistry&);			void initializeMachinePipelinerPass(PassRegistry&);
	void initializeMachinePostDominatorTreePass(PassRegistry&);			void initializeMachinePostDominatorTreePass(PassRegistry&);
	void initializeMachineRegionInfoPassPass(PassRegistry&);			void initializeMachineRegionInfoPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMCodeGen
MachineDominators.cpp		MachineDominators.cpp
MachineFrameInfo.cpp		MachineFrameInfo.cpp
MachineFunction.cpp		MachineFunction.cpp
MachineFunctionPass.cpp		MachineFunctionPass.cpp
MachineFunctionPrinterPass.cpp		MachineFunctionPrinterPass.cpp
MachineFunctionSplitter.cpp		MachineFunctionSplitter.cpp
MachineInstrBundle.cpp		MachineInstrBundle.cpp
MachineInstr.cpp		MachineInstr.cpp
		MachineLateInstrsCleanup.cpp
MachineLICM.cpp		MachineLICM.cpp
MachineLoopInfo.cpp		MachineLoopInfo.cpp
MachineLoopUtils.cpp		MachineLoopUtils.cpp
MachineModuleInfo.cpp		MachineModuleInfo.cpp
MachineModuleInfoImpls.cpp		MachineModuleInfoImpls.cpp
MachineModuleSlotTracker.cpp		MachineModuleSlotTracker.cpp
MachineOperand.cpp		MachineOperand.cpp
MachineOptimizationRemarkEmitter.cpp		MachineOptimizationRemarkEmitter.cpp
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMCodeGen
MachineStableHash.cpp		MachineStableHash.cpp
MIRVRegNamerUtils.cpp		MIRVRegNamerUtils.cpp
MIRNamerPass.cpp		MIRNamerPass.cpp
MIRCanonicalizerPass.cpp		MIRCanonicalizerPass.cpp
RegisterUsageInfo.cpp		RegisterUsageInfo.cpp
RegUsageInfoCollector.cpp		RegUsageInfoCollector.cpp
RegUsageInfoPropagate.cpp		RegUsageInfoPropagate.cpp
ReplaceWithVeclib.cpp		ReplaceWithVeclib.cpp
ResetMachineFunctionPass.cpp		ResetMachineFunctionPass.cpp
		RKSimonUnsubmitted Done Reply Inline Actions sorting RKSimon: sorting
RegisterBank.cpp		RegisterBank.cpp
RegisterBankInfo.cpp		RegisterBankInfo.cpp
SafeStack.cpp		SafeStack.cpp
SafeStackLayout.cpp		SafeStackLayout.cpp
SanitizerBinaryMetadata.cpp		SanitizerBinaryMetadata.cpp
ScheduleDAG.cpp		ScheduleDAG.cpp
ScheduleDAGInstrs.cpp		ScheduleDAGInstrs.cpp
ScheduleDAGPrinter.cpp		ScheduleDAGPrinter.cpp
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeMachineCFGPrinterPass(Registry);		initializeMachineCFGPrinterPass(Registry);
initializeMachineCSEPass(Registry);		initializeMachineCSEPass(Registry);
initializeMachineCombinerPass(Registry);		initializeMachineCombinerPass(Registry);
initializeMachineCopyPropagationPass(Registry);		initializeMachineCopyPropagationPass(Registry);
initializeMachineCycleInfoPrinterPassPass(Registry);		initializeMachineCycleInfoPrinterPassPass(Registry);
initializeMachineCycleInfoWrapperPassPass(Registry);		initializeMachineCycleInfoWrapperPassPass(Registry);
initializeMachineDominatorTreePass(Registry);		initializeMachineDominatorTreePass(Registry);
initializeMachineFunctionPrinterPassPass(Registry);		initializeMachineFunctionPrinterPassPass(Registry);
		initializeMachineLateInstrsCleanupPass(Registry);
initializeMachineLICMPass(Registry);		initializeMachineLICMPass(Registry);
initializeMachineLoopInfoPass(Registry);		initializeMachineLoopInfoPass(Registry);
initializeMachineModuleInfoWrapperPassPass(Registry);		initializeMachineModuleInfoWrapperPassPass(Registry);
initializeMachineOptimizationRemarkEmitterPassPass(Registry);		initializeMachineOptimizationRemarkEmitterPassPass(Registry);
initializeMachineOutlinerPass(Registry);		initializeMachineOutlinerPass(Registry);
initializeMachinePipelinerPass(Registry);		initializeMachinePipelinerPass(Registry);
initializeMachineSanitizerBinaryMetadataPass(Registry);		initializeMachineSanitizerBinaryMetadataPass(Registry);
initializeModuloScheduleTestPass(Registry);		initializeModuloScheduleTestPass(Registry);
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp

This file was added.

				//==--- MachineLateInstrsCleanup.cpp - Late Instructions Cleanup Pass -----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This simple pass removes any identical and redundant immediate or address
				// loads to the same register. The immediate loads removed can originally be
				// the result of rematerialization, while the addresses are redundant frame
				// addressing anchor points created during Frame Indices elimination.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/BitVector.h"
				#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineOperand.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"
				#include "llvm/CodeGen/TargetRegisterInfo.h"
				#include "llvm/CodeGen/TargetSubtargetInfo.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;

				#define DEBUG_TYPE "machine-latecleanup"

				STATISTIC(NumRemoved, "Number of redundant instructions removed.");

				namespace {

				class MachineLateInstrsCleanup : public MachineFunctionPass {
				const TargetRegisterInfo *TRI;
				const TargetInstrInfo *TII;

				// Data structures to map regs to their definitions per MBB.
				using Reg2DefMap = std::map<Register, MachineInstr*>;
				std::vector<Reg2DefMap> RegDefs;

				craig.topperUnsubmitted Not Done Reply Inline Actions Can these be DenseMaps? craig.topper: Can these be DenseMaps?
				jonpaAuthorUnsubmitted Not Done Reply Inline Actions I did make some experiments with DenseMap and other alternatives earlier (see May 16 above) and never got any improvements in compile time by doing so. In addition, it currently seems unwise to do that since processBlock ("Clear any entries in map that MI clobbers") is looping over the entries while erasing some of them. I could not do this now even after rewriting the loop like: // Data structures to map regs to their definitions per MBB. - using Reg2DefMap = std::map<Register, MachineInstr>; - using MBB2RegDefsMap = std::map<MachineBasicBlock , Reg2DefMap>; + using Reg2DefMap = DenseMap<Register, MachineInstr>; + using MBB2RegDefsMap = DenseMap<MachineBasicBlock , Reg2DefMap>; MBB2RegDefsMap RegDefs; // Set of visited MBBs. @@ -257,11 +258,10 @@ bool MachineLateInstrsCleanup::processBlock(MachineBasicBlock MBB) { // Clear any entries in map that MI clobbers. for (auto DefI = MBBDefs.begin(); DefI != MBBDefs.end();) { - Register Reg = DefI->first; + Reg2DefMap::iterator CurrI = DefI++; + Register Reg = CurrI->first; if (MI->modifiesRegister(Reg, TRI)) - DefI = MBBDefs.erase(DefI); - else - ++DefI; + MBBDefs.erase(CurrI); } I then got "Assertion `isHandleInSync() && "invalid iterator access!"' failed.", so it seems to me that this isn't supported. jonpa:* I did make some experiments with DenseMap and other alternatives earlier (see May 16 above) and…
				// Walk through the instructions in MBB and remove any redundant
				// instructions.
				bool processBlock(MachineBasicBlock *MBB);

				public:
				craig.topperUnsubmitted Done Reply Inline Actions Every MachineBasicBlock has a number assigned to it. This could possibly be a BitVector using the basic block numbering. See `MachineBasicBlock::getNumber()` and `MachineFunction::getNumBlockIDs` You might also be able to use the numbering instead of a map for MBB2RegDefsMap. But probably depends on how spare that map is. craig.topper: Every MachineBasicBlock has a number assigned to it. This could possibly be a BitVector using…
				jonpaAuthorUnsubmitted Not Done Reply Inline Actions Using BitVector for Visited and Visited_preds seems to lower the average compile time (on the files it shows up) by about ~0.04%, so it looks like a very slight but noticeable improvement. For the MBB2RegDefsMap I saw a similar slight further improvement on average, so with both of these changes an average improvement of 0.1% seems to result. It may be that with this change (MBB2RegDefsMap) some of the slower cases get slower, but the average is improved. I have only tested this on SystemZ, but I guess a BitMap really should be faster than a map, so it makes sense to use it to me. jonpa: Using BitVector for Visited and Visited_preds seems to lower the average compile time (on the…
				static char ID; // Pass identification, replacement for typeid

				MachineLateInstrsCleanup() : MachineFunctionPass(ID) {
				initializeMachineLateInstrsCleanupPass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				MachineFunctionProperties getRequiredProperties() const override {
				return MachineFunctionProperties().set(
				MachineFunctionProperties::Property::NoVRegs);
				}
				};

				} // end anonymous namespace

				char MachineLateInstrsCleanup::ID = 0;

				char &llvm::MachineLateInstrsCleanupID = MachineLateInstrsCleanup::ID;

				INITIALIZE_PASS(MachineLateInstrsCleanup, DEBUG_TYPE,
				"Machine Late Instructions Cleanup Pass", false, false)

				bool MachineLateInstrsCleanup::runOnMachineFunction(MachineFunction &MF) {
				if (skipFunction(MF.getFunction()))
				return false;

				TRI = MF.getSubtarget().getRegisterInfo();
				TII = MF.getSubtarget().getInstrInfo();
				MaskRayUnsubmitted Done Reply Inline Actions This can be moved closer to `Changed \|= processBlock(MBB);` MaskRay: This can be moved closer to `Changed \|= processBlock(MBB);`

				RegDefs.clear();
				RegDefs.resize(MF.getNumBlockIDs());

				// Visit all MBBs in an order that maximises the reuse from predecessors.
				bool Changed = false;
				ReversePostOrderTraversal<MachineFunction *> RPOT(&MF);
				for (MachineBasicBlock *MBB : RPOT)
				Changed \|= processBlock(MBB);

				return Changed;
				}

				// Clear any previous kill flag on Reg found before I in MBB. Walk backwards
				// in MBB and if needed continue in predecessors until a use/def of Reg is
				// encountered. This seems to be faster in practice than tracking kill flags
				// in a map.
				static void clearKillsForDef(Register Reg, MachineBasicBlock *MBB,
				MachineBasicBlock::iterator I,
				BitVector &VisitedPreds,
				const TargetRegisterInfo *TRI) {
				VisitedPreds.set(MBB->getNumber());
				while (I != MBB->begin()) {
				uabelhoUnsubmitted Not Done Reply Inline Actions Hi @jonpa It looks like clearKillsForDef doesn't really handle bundled input. E.g. for our OOT target I've seen cases where "kill" flags were cleared in the BUNDLE instruction, but not in the individual bundled instructions which then lead to verifier complaints. Do you know if there are other parts of the pass that have problems with bundled input? uabelho: Hi @jonpa It looks like clearKillsForDef doesn't really handle bundled input. E.g. for our OOT…
				--I;
				bool Found = false;
				craig.topperUnsubmitted Not Done Reply Inline Actions Can we use RPOT(reverse post order traversal)? That will guarantee this property except for loops. craig.topper: Can we use RPOT(reverse post order traversal)? That will guarantee this property except for…
				jonpaAuthorUnsubmitted Done Reply Inline Actions Yes - awesome! 55 lines of iteration logic removed, about the same amount of success (even very slightly better on SPEC), and improved compile time (ave 0.57% -> 0.50%). jonpa: Yes - awesome! 55 lines of iteration logic removed, about the same amount of success (even…
				for (auto &MO : I->operands())
				MaskRayUnsubmitted Done Reply Inline Actions `--I` https://llvm.org/docs/CodingStandards.html#prefer-preincrement MaskRay: `--I` https://llvm.org/docs/CodingStandards.html#prefer-preincrement
				if (MO.isReg() && TRI->regsOverlap(MO.getReg(), Reg)) {
				if (MO.isDef())
				return;
				if (MO.readsReg()) {
				MO.setIsKill(false);
				Found = true; // Keep going for an implicit kill of the super-reg.
				}
				}
				if (Found)
				return;
				uabelhoUnsubmitted Not Done Reply Inline Actions I wonder if this won't abort too early? Or are we really guaranteed to find an implicit kill of the super-reg within this instruction if we find a kill of a sub-reg? At least for my target I've found cases like DEF superreg USE killed subreg1 USE killed subreg2 DEF superreg USE killed subreg1 and with the current implementation we get DEF super reg USE killed subreg1 USE subreg2 USE killed subreg1 and then the verifier complains on the second use of subreg1. uabelho: I wonder if this won't abort too early? Or are we really guaranteed to find an implicit kill of…
				}

				// If an earlier def is not in MBB, continue in predecessors.
				if (!MBB->isLiveIn(Reg))
				MBB->addLiveIn(Reg);
				assert(!MBB->pred_empty() && "Predecessor def not found!");
				for (MachineBasicBlock *Pred : MBB->predecessors())
				if (!VisitedPreds.test(Pred->getNumber()))
				clearKillsForDef(Reg, Pred, Pred->end(), VisitedPreds, TRI);
				}

				static void removeRedundantDef(MachineInstr *MI,
				const TargetRegisterInfo *TRI) {
				Register Reg = MI->getOperand(0).getReg();
				BitVector VisitedPreds(MI->getMF()->getNumBlockIDs());
				clearKillsForDef(Reg, MI->getParent(), MI->getIterator(), VisitedPreds, TRI);
				craig.topperUnsubmitted Done Reply Inline Actions No underscores in variable names. Use VisitedPreds craig.topper: No underscores in variable names. Use VisitedPreds
				MI->eraseFromParent();
				++NumRemoved;
				}

				// Return true if MI is a potential candidate for reuse/removal and if so
				// also the register it defines in DefedReg. A candidate is a simple
				// instruction that does not touch memory, has only one register definition
				// and the only reg it may use is FrameReg. Typically this is an immediate
				// load or a load-address instruction.
				static bool isCandidate(const MachineInstr *MI, Register &DefedReg,
				Register FrameReg) {
				DefedReg = MCRegister::NoRegister;
				bool SawStore = true;
				if (!MI->isSafeToMove(nullptr, SawStore) \|\| MI->isImplicitDef() \|\|
				MI->isInlineAsm())
				return false;
				for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
				craig.topperUnsubmitted Done Reply Inline Actions !pred_empty craig.topper: !pred_empty
				const MachineOperand &MO = MI->getOperand(i);
				if (MO.isReg()) {
				if (MO.isDef()) {
				if (i == 0 && !MO.isImplicit() && !MO.isDead())
				DefedReg = MO.getReg();
				else
				return false;
				} else if (MO.getReg() && MO.getReg() != FrameReg)
				return false;
				craig.topperUnsubmitted Done Reply Inline Actions I think you can use `getMF()` instead of getParent()->getParent() craig.topper: I think you can use `getMF()` instead of getParent()->getParent()
				} else if (!(MO.isImm() \|\| MO.isCImm() \|\| MO.isFPImm() \|\| MO.isCPI() \|\|
				MO.isGlobal() \|\| MO.isSymbol()))
				return false;
				}
				return DefedReg.isValid();
				}

				bool MachineLateInstrsCleanup::processBlock(MachineBasicBlock *MBB) {
				bool Changed = false;
				Reg2DefMap &MBBDefs = RegDefs[MBB->getNumber()];

				// Find reusable definitions in the predecessor(s).
				MaskRayUnsubmitted Done Reply Inline Actions Don't add unneeded blank lines between variable definitions or a variable definition. If the next statement has a comment, it is fine to keep a blank line before the comment. Applies to elsewhere in the patch. MaskRay: Don't add unneeded blank lines between variable definitions or a variable definition. If the…
				if (!MBB->pred_empty() && !MBB->isEHPad()) {
				MachineBasicBlock FirstPred = MBB->pred_begin();
				for (auto [Reg, DefMI] : RegDefs[FirstPred->getNumber()])
				if (llvm::all_of(
				drop_begin(MBB->predecessors()),
				[&, &Reg = Reg, &DefMI = DefMI](const MachineBasicBlock *Pred) {
				foadUnsubmitted Done Reply Inline Actions Could use structured bindings: `for (auto [Reg, DefMI] : ...)` foad: Could use structured bindings: `for (auto [Reg, DefMI] : ...)`
				auto PredDefI = RegDefs[Pred->getNumber()].find(Reg);
				return PredDefI != RegDefs[Pred->getNumber()].end() &&
				DefMI->isIdenticalTo(*PredDefI->second);
				foadUnsubmitted Done Reply Inline Actions Could use `all_of(drop_begin(MBB->predecessors()), ...)`? foad: Could use `all_of(drop_begin(MBB->predecessors()), ...)`?
				})) {
				MBBDefs[Reg] = DefMI;
				LLVM_DEBUG(dbgs() << "Reusable instruction from pred(s): in "
				<< printMBBReference(MBB) << ": " << DefMI;);
				}
				}

				// Process MBB.
				MachineFunction *MF = MBB->getParent();
				foadUnsubmitted Done Reply Inline Actions Use `<< printMBBReference(MBB) <<` instead of rolling your own MBB#number syntax, here and elsewhere. foad:* Use `<< printMBBReference(*MBB) <<` instead of rolling your own MBB#number syntax, here and…
				const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
				Register FrameReg = TRI->getFrameRegister(*MF);
				craig.topperUnsubmitted Done Reply Inline Actions Use DefedReg.isValid(). Someday we should remove the implicited casts to unsigned integers from the Register class. craig.topper: Use DefedReg.isValid(). Someday we should remove the implicited casts to unsigned integers from…
				for (MachineInstr &MI : llvm::make_early_inc_range(*MBB)) {
				// If FrameReg is modified, no previous load-address instructions (using
				// it) are valid.
				if (MI.modifiesRegister(FrameReg, TRI)) {
				MBBDefs.clear();
				continue;
				MaskRayUnsubmitted Done Reply Inline Actions This condition is somewhat strange. MaskRay: This condition is somewhat strange.
				}

				foadUnsubmitted Done Reply Inline Actions This looks like `for (auto &MI : make_early_inc_range(MBB)))` foad: This looks like `for (auto &MI : make_early_inc_range(MBB)))`
				Register DefedReg;
				bool IsCandidate = isCandidate(&MI, DefedReg, FrameReg);
				craig.topperUnsubmitted Done Reply Inline Actions !MBB->pred_empty()? craig.topper: !MBB->pred_empty()?
				foadUnsubmitted Done Reply Inline Actions Comment should explain why rather than what. foad: Comment should explain //why// rather than //what//.

				// Check for an earlier identical and reusable instruction.
				if (IsCandidate) {
				auto DefI = MBBDefs.find(DefedReg);
				if (DefI != MBBDefs.end() && MI.isIdenticalTo(*DefI->second)) {
				LLVM_DEBUG(dbgs() << "Removing redundant instruction in "
				<< printMBBReference(*MBB) << ": " << MI;);
				removeRedundantDef(&MI, TRI);
				Changed = true;
				continue;
				}
				}

				// Clear any entries in map that MI clobbers.
				for (auto DefI = MBBDefs.begin(); DefI != MBBDefs.end();) {
				Register Reg = DefI->first;
				craig.topperUnsubmitted Done Reply Inline Actions Could this use llvm::all_of with a lambda? craig.topper: Could this use llvm::all_of with a lambda?
				jonpaAuthorUnsubmitted Done Reply Inline Actions yeah jonpa: yeah
				if (MI.modifiesRegister(Reg, TRI))
				DefI = MBBDefs.erase(DefI);
				else
				++DefI;
				}

				// Record this MI for potential later reuse.
				if (IsCandidate) {
				LLVM_DEBUG(dbgs() << "Found interesting instruction in "
				<< printMBBReference(*MBB) << ": " << MI;);
				MBBDefs[DefedReg] = &MI;
				}
				}

				return Changed;
				}
				craig.topperUnsubmitted Done Reply Inline Actions !Worklist.empty()? craig.topper: !Worklist.empty()?
				craig.topperUnsubmitted Done Reply Inline Actions Would std::queue be better here? craig.topper: Would std::queue be better here?
				jonpaAuthorUnsubmitted Done Reply Inline Actions It's 0.01% faster on average, so why not :-) jonpa: It's 0.01% faster on average, so why not :-)

llvm/lib/CodeGen/TargetPassConfig.cpp

	Show First 20 Lines • Show All 1,517 Lines • ▼ Show 20 Lines
	}			}

	//===---------------------------------------------------------------------===//			//===---------------------------------------------------------------------===//
	/// Post RegAlloc Pass Configuration			/// Post RegAlloc Pass Configuration
	//===---------------------------------------------------------------------===//			//===---------------------------------------------------------------------===//

	/// Add passes that optimize machine instructions after register allocation.			/// Add passes that optimize machine instructions after register allocation.
	void TargetPassConfig::addMachineLateOptimization() {			void TargetPassConfig::addMachineLateOptimization() {
				// Cleanup of redundant immediate/address loads.
				addPass(&MachineLateInstrsCleanupID);

	// Branch folding must be run after regalloc and prolog/epilog insertion.			// Branch folding must be run after regalloc and prolog/epilog insertion.
	addPass(&BranchFolderPassID);			addPass(&BranchFolderPassID);

	// Tail duplication.			// Tail duplication.
	// Note that duplicating tail just increases code size and degrades			// Note that duplicating tail just increases code size and degrades
	// performance for targets that require Structured Control Flow.			// performance for targets that require Structured Control Flow.
	// In addition it can also make CFG irreducible. Thus we disable it.			// In addition it can also make CFG irreducible. Thus we disable it.
	if (!TM->requiresStructuredCFG())			if (!TM->requiresStructuredCFG())
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

	Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines

	void NVPTXPassConfig::addIRPasses() {			void NVPTXPassConfig::addIRPasses() {
	// The following passes are known to not play well with virtual regs hanging			// The following passes are known to not play well with virtual regs hanging
	// around after register allocation (which in our case, is all registers).			// around after register allocation (which in our case, is all registers).
	// We explicitly disable them here. We do, however, need some functionality			// We explicitly disable them here. We do, however, need some functionality
	// of the PrologEpilogCodeInserter pass, so we emulate that behavior in the			// of the PrologEpilogCodeInserter pass, so we emulate that behavior in the
	// NVPTXPrologEpilog pass (see NVPTXPrologEpilogPass.cpp).			// NVPTXPrologEpilog pass (see NVPTXPrologEpilogPass.cpp).
	disablePass(&PrologEpilogCodeInserterID);			disablePass(&PrologEpilogCodeInserterID);
				disablePass(&MachineLateInstrsCleanupID);
	disablePass(&MachineCopyPropagationID);			disablePass(&MachineCopyPropagationID);
	disablePass(&TailDuplicateID);			disablePass(&TailDuplicateID);
	disablePass(&StackMapLivenessID);			disablePass(&StackMapLivenessID);
	disablePass(&LiveDebugValuesID);			disablePass(&LiveDebugValuesID);
	disablePass(&PostRAMachineSinkingID);			disablePass(&PostRAMachineSinkingID);
	disablePass(&PostRASchedulerID);			disablePass(&PostRASchedulerID);
	disablePass(&FuncletLayoutID);			disablePass(&FuncletLayoutID);
	disablePass(&PatchableFunctionID);			disablePass(&PatchableFunctionID);
	▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	void RISCVPassConfig::addPreRegAlloc() {
if (TM->getOptLevel() != CodeGenOpt::None)		if (TM->getOptLevel() != CodeGenOpt::None)
addPass(createRISCVMergeBaseOffsetOptPass());		addPass(createRISCVMergeBaseOffsetOptPass());
addPass(createRISCVInsertVSETVLIPass());		addPass(createRISCVInsertVSETVLIPass());
}		}

void RISCVPassConfig::addPostRegAlloc() {		void RISCVPassConfig::addPostRegAlloc() {
if (TM->getOptLevel() != CodeGenOpt::None && EnableRedundantCopyElimination)		if (TM->getOptLevel() != CodeGenOpt::None && EnableRedundantCopyElimination)
addPass(createRISCVRedundantCopyEliminationPass());		addPass(createRISCVRedundantCopyEliminationPass());

		// Temporarily disabled until post-RA pseudo expansion problem is fixed,
		// see D123394 and D139169.
		disablePass(&MachineLateInstrsCleanupID);
}		}

yaml::MachineFunctionInfo *		yaml::MachineFunctionInfo *
RISCVTargetMachine::createDefaultFuncInfoYAML() const {		RISCVTargetMachine::createDefaultFuncInfoYAML() const {
return new yaml::RISCVMachineFunctionInfo();		return new yaml::RISCVMachineFunctionInfo();
}		}

yaml::MachineFunctionInfo *		yaml::MachineFunctionInfo *
Show All 13 Lines

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp

	Show First 20 Lines • Show All 495 Lines • ▼ Show 20 Lines
	}			}

	void WebAssemblyPassConfig::addPostRegAlloc() {			void WebAssemblyPassConfig::addPostRegAlloc() {
	// TODO: The following CodeGen passes don't currently support code containing			// TODO: The following CodeGen passes don't currently support code containing
	// virtual registers. Consider removing their restrictions and re-enabling			// virtual registers. Consider removing their restrictions and re-enabling
	// them.			// them.

	// These functions all require the NoVRegs property.			// These functions all require the NoVRegs property.
				disablePass(&MachineLateInstrsCleanupID);
	disablePass(&MachineCopyPropagationID);			disablePass(&MachineCopyPropagationID);
	disablePass(&PostRAMachineSinkingID);			disablePass(&PostRAMachineSinkingID);
	disablePass(&PostRASchedulerID);			disablePass(&PostRASchedulerID);
	disablePass(&FuncletLayoutID);			disablePass(&FuncletLayoutID);
	disablePass(&StackMapLivenessID);			disablePass(&StackMapLivenessID);
	disablePass(&LiveDebugValuesID);			disablePass(&LiveDebugValuesID);
	disablePass(&PatchableFunctionID);			disablePass(&PatchableFunctionID);
	disablePass(&ShrinkWrapID);			disablePass(&ShrinkWrapID);
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: Shrink Wrapping analysis			; CHECK-NEXT: Shrink Wrapping analysis
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; CHECK-NEXT: Machine Late Instructions Cleanup Pass
	; CHECK-NEXT: Control Flow Optimizer			; CHECK-NEXT: Control Flow Optimizer
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Tail Duplication			; CHECK-NEXT: Tail Duplication
	; CHECK-NEXT: Machine Copy Propagation Pass			; CHECK-NEXT: Machine Copy Propagation Pass
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: AArch64 pseudo instruction expansion pass			; CHECK-NEXT: AArch64 pseudo instruction expansion pass
	; CHECK-NEXT: AArch64 load / store optimization pass			; CHECK-NEXT: AArch64 load / store optimization pass
	; CHECK-NEXT: Insert KCFI indirect call checks			; CHECK-NEXT: Insert KCFI indirect call checks
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/stack-guard-remat-bitcast.ll

	Show All 23 Lines
	; CHECK-NEXT: ldr x8, [x8, ___stack_chk_guard@GOTPAGEOFF]			; CHECK-NEXT: ldr x8, [x8, ___stack_chk_guard@GOTPAGEOFF]
	; CHECK-NEXT: Lloh3:			; CHECK-NEXT: Lloh3:
	; CHECK-NEXT: ldr x9, [x9, ___stack_chk_guard@GOTPAGEOFF]			; CHECK-NEXT: ldr x9, [x9, ___stack_chk_guard@GOTPAGEOFF]
	; CHECK-NEXT: Lloh4:			; CHECK-NEXT: Lloh4:
	; CHECK-NEXT: ldr x8, [x8]			; CHECK-NEXT: ldr x8, [x8]
	; CHECK-NEXT: Lloh5:			; CHECK-NEXT: Lloh5:
	; CHECK-NEXT: ldr x9, [x9]			; CHECK-NEXT: ldr x9, [x9]
	; CHECK-NEXT: str x8, [sp]			; CHECK-NEXT: str x8, [sp]
	; CHECK-NEXT: Lloh6:
	; CHECK-NEXT: adrp x8, ___stack_chk_guard@GOTPAGE
	; CHECK-NEXT: stur x9, [x29, #-8]			; CHECK-NEXT: stur x9, [x29, #-8]
	; CHECK-NEXT: Lloh7:
	; CHECK-NEXT: ldr x8, [x8, ___stack_chk_guard@GOTPAGEOFF]
	; CHECK-NEXT: ldur x9, [x29, #-8]			; CHECK-NEXT: ldur x9, [x29, #-8]
	; CHECK-NEXT: Lloh8:
	; CHECK-NEXT: ldr x8, [x8]
	; CHECK-NEXT: cmp x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: b.ne LBB0_2			; CHECK-NEXT: b.ne LBB0_2
	; CHECK-NEXT: ; %bb.1: ; %entry			; CHECK-NEXT: ; %bb.1: ; %entry
	; CHECK-NEXT: ldp x29, x30, [sp, #48] ; 16-byte Folded Reload			; CHECK-NEXT: ldp x29, x30, [sp, #48] ; 16-byte Folded Reload
	; CHECK-NEXT: mov w0, #-1			; CHECK-NEXT: mov w0, #-1
	; CHECK-NEXT: add sp, sp, #64			; CHECK-NEXT: add sp, sp, #64
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: LBB0_2: ; %entry			; CHECK-NEXT: LBB0_2: ; %entry
	; CHECK-NEXT: bl ___stack_chk_fail			; CHECK-NEXT: bl ___stack_chk_fail
	; CHECK-NEXT: .loh AdrpLdrGotLdr Lloh6, Lloh7, Lloh8
	; CHECK-NEXT: .loh AdrpLdrGotLdr Lloh1, Lloh3, Lloh5			; CHECK-NEXT: .loh AdrpLdrGotLdr Lloh1, Lloh3, Lloh5
	; CHECK-NEXT: .loh AdrpLdrGotLdr Lloh0, Lloh2, Lloh4			; CHECK-NEXT: .loh AdrpLdrGotLdr Lloh0, Lloh2, Lloh4
	entry:			entry:
	%StackGuardSlot = alloca i8*			%StackGuardSlot = alloca i8*
	%StackGuard = load i8, i8* bitcast (i64 @__stack_chk_guard to i8)			%StackGuard = load i8, i8* bitcast (i64 @__stack_chk_guard to i8)
	call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)			call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)
	%container = alloca [32 x i8], align 1			%container = alloca [32 x i8], align 1
	call void @llvm.stackprotectorcheck(i8 bitcast (i64 @__stack_chk_guard to i8**))			call void @llvm.stackprotectorcheck(i8 bitcast (i64 @__stack_chk_guard to i8**))
	ret i32 -1			ret i32 -1
	}			}

	declare void @llvm.stackprotector(i8, i8*) ssp			declare void @llvm.stackprotector(i8, i8*) ssp
	declare void @llvm.stackprotectorcheck(i8**) ssp			declare void @llvm.stackprotectorcheck(i8**) ssp

llvm/test/CodeGen/AArch64/sve-calling-convention-mixed.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines

	define float @foo2(double* %x0, double* %x1) nounwind {			define float @foo2(double* %x0, double* %x1) nounwind {
	; CHECK-LABEL: foo2:			; CHECK-LABEL: foo2:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill			; CHECK-NEXT: stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
	; CHECK-NEXT: addvl sp, sp, #-4			; CHECK-NEXT: addvl sp, sp, #-4
	; CHECK-NEXT: sub sp, sp, #16			; CHECK-NEXT: sub sp, sp, #16
	; CHECK-NEXT: ptrue p0.b			; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: add x9, sp, #16			; CHECK-NEXT: add x8, sp, #16
	; CHECK-NEXT: ld4d { z1.d - z4.d }, p0/z, [x0]			; CHECK-NEXT: ld4d { z1.d - z4.d }, p0/z, [x0]
	; CHECK-NEXT: ld4d { z16.d - z19.d }, p0/z, [x1]			; CHECK-NEXT: ld4d { z16.d - z19.d }, p0/z, [x1]
	; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: add x8, sp, #16
	; CHECK-NEXT: fmov s0, #1.00000000			; CHECK-NEXT: fmov s0, #1.00000000
	; CHECK-NEXT: mov w0, wzr			; CHECK-NEXT: mov w0, wzr
	; CHECK-NEXT: mov w1, #1			; CHECK-NEXT: mov w1, #1
	; CHECK-NEXT: mov w2, #2			; CHECK-NEXT: mov w2, #2
	; CHECK-NEXT: st1d { z16.d }, p0, [x9]
	; CHECK-NEXT: add x9, sp, #16
	; CHECK-NEXT: mov w3, #3			; CHECK-NEXT: mov w3, #3
	; CHECK-NEXT: mov w4, #4			; CHECK-NEXT: mov w4, #4
	; CHECK-NEXT: mov w5, #5			; CHECK-NEXT: mov w5, #5
	; CHECK-NEXT: mov w6, #6			; CHECK-NEXT: mov w6, #6
	; CHECK-NEXT: st1d { z17.d }, p0, [x9, #1, mul vl]
	; CHECK-NEXT: add x9, sp, #16
	; CHECK-NEXT: mov w7, #7			; CHECK-NEXT: mov w7, #7
	; CHECK-NEXT: st1d { z18.d }, p0, [x9, #2, mul vl]
	; CHECK-NEXT: add x9, sp, #16			; CHECK-NEXT: add x9, sp, #16
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: st1d { z16.d }, p0, [x9]
				; CHECK-NEXT: st1d { z17.d }, p0, [x9, #1, mul vl]
				; CHECK-NEXT: st1d { z18.d }, p0, [x9, #2, mul vl]
	; CHECK-NEXT: st1d { z19.d }, p0, [x9, #3, mul vl]			; CHECK-NEXT: st1d { z19.d }, p0, [x9, #3, mul vl]
	; CHECK-NEXT: str x8, [sp]			; CHECK-NEXT: str x8, [sp]
	; CHECK-NEXT: bl callee2			; CHECK-NEXT: bl callee2
	; CHECK-NEXT: addvl sp, sp, #4			; CHECK-NEXT: addvl sp, sp, #4
	; CHECK-NEXT: add sp, sp, #16			; CHECK-NEXT: add sp, sp, #16
	; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload			; CHECK-NEXT: ldp x29, x30, [sp], #16 // 16-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	▲ Show 20 Lines • Show All 641 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	; FLATSCR-LABEL: kernel_caller_byval:			; FLATSCR-LABEL: kernel_caller_byval:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 0			; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
	; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; FLATSCR-NEXT: v_mov_b32_e32 v1, 0			; FLATSCR-NEXT: v_mov_b32_e32 v1, 0
	; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0			; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0
	; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0			; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], vcc_lo offset:8
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], vcc_hi offset:16
	; FLATSCR-NEXT: s_mov_b32 s11, 0			; FLATSCR-NEXT: s_mov_b32 s11, 0
	; FLATSCR-NEXT: s_mov_b32 s10, 0			; FLATSCR-NEXT: s_mov_b32 s10, 0
	; FLATSCR-NEXT: s_mov_b32 s9, 0			; FLATSCR-NEXT: s_mov_b32 s9, 0
	; FLATSCR-NEXT: s_mov_b32 s8, 0			; FLATSCR-NEXT: s_mov_b32 s8, 0
	; FLATSCR-NEXT: s_mov_b32 s7, 0			; FLATSCR-NEXT: s_mov_b32 s7, 0
	; FLATSCR-NEXT: s_mov_b32 s6, 0			; FLATSCR-NEXT: s_mov_b32 s6, 0
	; FLATSCR-NEXT: s_mov_b32 s5, 0			; FLATSCR-NEXT: s_mov_b32 s5, 0
	; FLATSCR-NEXT: s_mov_b32 s1, 0			; FLATSCR-NEXT: s_mov_b32 s1, 0
	; FLATSCR-NEXT: s_mov_b32 s0, 0			; FLATSCR-NEXT: s_mov_b32 s0, 0
	; FLATSCR-NEXT: s_mov_b32 s4, 0			; FLATSCR-NEXT: s_mov_b32 s4, 0
	; FLATSCR-NEXT: s_mov_b32 s3, 0			; FLATSCR-NEXT: s_mov_b32 s3, 0
	; FLATSCR-NEXT: s_mov_b32 s2, 0			; FLATSCR-NEXT: s_mov_b32 s2, 0
	; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], vcc_lo offset:8
	; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], vcc_hi offset:16
	; FLATSCR-NEXT: s_mov_b32 s40, 0
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s11 offset:24			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s11 offset:24
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s10 offset:32			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s10 offset:32
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s9 offset:40			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s9 offset:40
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s8 offset:48			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s8 offset:48
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s7 offset:56			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s7 offset:56
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s6 offset:64			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s6 offset:64
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s5 offset:72			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s5 offset:72
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s1 offset:80			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s1 offset:80
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s0 offset:88			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s0 offset:88
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s4 offset:96			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s4 offset:96
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s3 offset:104			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s3 offset:104
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s2 offset:112			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s2 offset:112
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], vcc_lo offset:120			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], vcc_lo offset:120
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], vcc_hi offset:128			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], vcc_hi offset:128
				; FLATSCR-NEXT: s_mov_b32 s40, 0
	; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, s40 offset:8			; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, s40 offset:8
	; FLATSCR-NEXT: s_mov_b32 s39, 0			; FLATSCR-NEXT: s_mov_b32 s39, 0
	; FLATSCR-NEXT: scratch_load_dwordx2 v[2:3], off, s39 offset:16			; FLATSCR-NEXT: scratch_load_dwordx2 v[2:3], off, s39 offset:16
	; FLATSCR-NEXT: s_mov_b32 s38, 0			; FLATSCR-NEXT: s_mov_b32 s38, 0
	; FLATSCR-NEXT: scratch_load_dwordx2 v[4:5], off, s38 offset:24			; FLATSCR-NEXT: scratch_load_dwordx2 v[4:5], off, s38 offset:24
	; FLATSCR-NEXT: s_mov_b32 s37, 0			; FLATSCR-NEXT: s_mov_b32 s37, 0
	; FLATSCR-NEXT: scratch_load_dwordx2 v[6:7], off, s37 offset:32			; FLATSCR-NEXT: scratch_load_dwordx2 v[6:7], off, s37 offset:32
	; FLATSCR-NEXT: s_mov_b32 s36, 0			; FLATSCR-NEXT: s_mov_b32 s36, 0
	▲ Show 20 Lines • Show All 242 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll

	Show First 20 Lines • Show All 1,348 Lines • ▼ Show 20 Lines
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xa			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xa
	; GFX7-NEXT: v_lshlrev_b32_e32 v1, 2, v0			; GFX7-NEXT: v_lshlrev_b32_e32 v1, 2, v0
	; GFX7-NEXT: v_mov_b32_e32 v2, 0			; GFX7-NEXT: v_mov_b32_e32 v2, 0
	; GFX7-NEXT: s_mov_b32 s6, 0			; GFX7-NEXT: s_mov_b32 s6, 0
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: buffer_load_dwordx3 v[1:3], v[1:2], s[4:7], 0 addr64			; GFX7-NEXT: buffer_load_dwordx3 v[1:3], v[1:2], s[4:7], 0 addr64
	; GFX7-NEXT: s_mov_b32 s6, 0
	; GFX7-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX7-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX7-NEXT: s_and_saveexec_b64 s[2:3], vcc			; GFX7-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX7-NEXT: s_cbranch_execz .LBB13_2			; GFX7-NEXT: s_cbranch_execz .LBB13_2
	; GFX7-NEXT: ; %bb.1: ; %bb			; GFX7-NEXT: ; %bb.1: ; %bb
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x14			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x14
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cc-update.ll

	Show First 20 Lines • Show All 531 Lines • ▼ Show 20 Lines
	; GFX803-NEXT: s_add_u32 s0, s0, s7			; GFX803-NEXT: s_add_u32 s0, s0, s7
	; GFX803-NEXT: s_addc_u32 s1, s1, 0			; GFX803-NEXT: s_addc_u32 s1, s1, 0
	; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc			; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc
	; GFX803-NEXT: s_waitcnt vmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0)
	; GFX803-NEXT: s_mov_b32 s4, 0x40000			; GFX803-NEXT: s_mov_b32 s4, 0x40000
	; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill			; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill
	; GFX803-NEXT: ;;#ASMSTART			; GFX803-NEXT: ;;#ASMSTART
	; GFX803-NEXT: ;;#ASMEND			; GFX803-NEXT: ;;#ASMEND
	; GFX803-NEXT: s_mov_b32 s4, 0x40000
	; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload			; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
	; GFX803-NEXT: s_waitcnt vmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0)
	; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8			; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
	; GFX803-NEXT: s_waitcnt vmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0)
	; GFX803-NEXT: s_endpgm			; GFX803-NEXT: s_endpgm
	;			;
	; GFX900-LABEL: test_sgpr_offset_kernel:			; GFX900-LABEL: test_sgpr_offset_kernel:
	; GFX900: ; %bb.0: ; %entry			; GFX900: ; %bb.0: ; %entry
	; GFX900-NEXT: s_add_u32 s0, s0, s7			; GFX900-NEXT: s_add_u32 s0, s0, s7
	; GFX900-NEXT: s_addc_u32 s1, s1, 0			; GFX900-NEXT: s_addc_u32 s1, s1, 0
	; GFX900-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc			; GFX900-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc
	; GFX900-NEXT: s_waitcnt vmcnt(0)			; GFX900-NEXT: s_waitcnt vmcnt(0)
	; GFX900-NEXT: s_mov_b32 s4, 0x40000			; GFX900-NEXT: s_mov_b32 s4, 0x40000
	; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill			; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill
	; GFX900-NEXT: ;;#ASMSTART			; GFX900-NEXT: ;;#ASMSTART
	; GFX900-NEXT: ;;#ASMEND			; GFX900-NEXT: ;;#ASMEND
	; GFX900-NEXT: s_mov_b32 s4, 0x40000
	; GFX900-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload			; GFX900-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
	; GFX900-NEXT: s_waitcnt vmcnt(0)			; GFX900-NEXT: s_waitcnt vmcnt(0)
	; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8			; GFX900-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
	; GFX900-NEXT: s_waitcnt vmcnt(0)			; GFX900-NEXT: s_waitcnt vmcnt(0)
	; GFX900-NEXT: s_endpgm			; GFX900-NEXT: s_endpgm
	;			;
	; GFX1010-LABEL: test_sgpr_offset_kernel:			; GFX1010-LABEL: test_sgpr_offset_kernel:
	; GFX1010: ; %bb.0: ; %entry			; GFX1010: ; %bb.0: ; %entry
	; GFX1010-NEXT: s_add_u32 s0, s0, s7			; GFX1010-NEXT: s_add_u32 s0, s0, s7
	; GFX1010-NEXT: s_addc_u32 s1, s1, 0			; GFX1010-NEXT: s_addc_u32 s1, s1, 0
	; GFX1010-NEXT: s_mov_b32 s4, 0x20000			; GFX1010-NEXT: s_mov_b32 s4, 0x20000
	; GFX1010-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc dlc			; GFX1010-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc dlc
	; GFX1010-NEXT: s_waitcnt vmcnt(0)			; GFX1010-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill			; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill
	; GFX1010-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1010-NEXT: s_mov_b32 s4, 0x20000
	; GFX1010-NEXT: ;;#ASMSTART			; GFX1010-NEXT: ;;#ASMSTART
	; GFX1010-NEXT: ;;#ASMEND			; GFX1010-NEXT: ;;#ASMEND
	; GFX1010-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload			; GFX1010-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
	; GFX1010-NEXT: s_waitcnt vmcnt(0)			; GFX1010-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8			; GFX1010-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
	; GFX1010-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-NEXT: s_endpgm			; GFX1010-NEXT: s_endpgm
	;			;
	; GFX1100-LABEL: test_sgpr_offset_kernel:			; GFX1100-LABEL: test_sgpr_offset_kernel:
	; GFX1100: ; %bb.0: ; %entry			; GFX1100: ; %bb.0: ; %entry
	; GFX1100-NEXT: scratch_load_b32 v0, off, off offset:8 glc dlc			; GFX1100-NEXT: scratch_load_b32 v0, off, off offset:8 glc dlc
	; GFX1100-NEXT: s_waitcnt vmcnt(0)			; GFX1100-NEXT: s_waitcnt vmcnt(0)
	; GFX1100-NEXT: s_movk_i32 s0, 0x1000			; GFX1100-NEXT: s_movk_i32 s0, 0x1000
	; GFX1100-NEXT: scratch_store_b32 off, v0, s0 ; 4-byte Folded Spill			; GFX1100-NEXT: scratch_store_b32 off, v0, s0 ; 4-byte Folded Spill
	; GFX1100-NEXT: s_movk_i32 s0, 0x1000
	; GFX1100-NEXT: ;;#ASMSTART			; GFX1100-NEXT: ;;#ASMSTART
	; GFX1100-NEXT: ;;#ASMEND			; GFX1100-NEXT: ;;#ASMEND
	; GFX1100-NEXT: scratch_load_b32 v0, off, s0 ; 4-byte Folded Reload			; GFX1100-NEXT: scratch_load_b32 v0, off, s0 ; 4-byte Folded Reload
	; GFX1100-NEXT: s_waitcnt vmcnt(0)			; GFX1100-NEXT: s_waitcnt vmcnt(0)
	; GFX1100-NEXT: scratch_store_b32 off, v0, off offset:8 dlc			; GFX1100-NEXT: scratch_store_b32 off, v0, off offset:8 dlc
	; GFX1100-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1100-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX1100-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX1100-NEXT: s_endpgm			; GFX1100-NEXT: s_endpgm
	Show All 25 Lines

llvm/test/CodeGen/AMDGPU/exec-mask-opt-cannot-create-empty-or-backward-segment.ll

	Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_cbranch_vccnz .LBB0_2			; CHECK-NEXT: s_cbranch_vccnz .LBB0_2
	; CHECK-NEXT: ; %bb.9: ; %bb13			; CHECK-NEXT: ; %bb.9: ; %bb13
	; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: s_mov_b64 vcc, s[4:5]			; CHECK-NEXT: s_mov_b64 vcc, s[4:5]
	; CHECK-NEXT: s_cbranch_vccz .LBB0_11			; CHECK-NEXT: s_cbranch_vccz .LBB0_11
	; CHECK-NEXT: ; %bb.10: ; %bb16			; CHECK-NEXT: ; %bb.10: ; %bb16
	; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: s_mov_b64 s[16:17], 0			; CHECK-NEXT: s_mov_b64 s[16:17], 0
	; CHECK-NEXT: s_mov_b64 s[20:21], -1
	; CHECK-NEXT: s_mov_b64 s[22:23], s[10:11]			; CHECK-NEXT: s_mov_b64 s[22:23], s[10:11]
	; CHECK-NEXT: s_mov_b64 s[18:19], s[16:17]			; CHECK-NEXT: s_mov_b64 s[18:19], s[16:17]
	; CHECK-NEXT: s_branch .LBB0_2			; CHECK-NEXT: s_branch .LBB0_2
	; CHECK-NEXT: .LBB0_11: ; in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: .LBB0_11: ; in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: s_mov_b64 s[22:23], -1
	; CHECK-NEXT: s_mov_b64 s[20:21], 0			; CHECK-NEXT: s_mov_b64 s[20:21], 0
	; CHECK-NEXT: ; implicit-def: $sgpr16_sgpr17			; CHECK-NEXT: ; implicit-def: $sgpr16_sgpr17
	; CHECK-NEXT: s_mov_b64 s[18:19], s[16:17]			; CHECK-NEXT: s_mov_b64 s[18:19], s[16:17]
	; CHECK-NEXT: s_branch .LBB0_2			; CHECK-NEXT: s_branch .LBB0_2
	; CHECK-NEXT: .LBB0_12: ; %loop.exit.guard6			; CHECK-NEXT: .LBB0_12: ; %loop.exit.guard6
	; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_3 Depth=1
	; CHECK-NEXT: s_xor_b64 s[14:15], s[20:21], -1			; CHECK-NEXT: s_xor_b64 s[14:15], s[20:21], -1
	; CHECK-NEXT: s_mov_b64 s[20:21], -1			; CHECK-NEXT: s_mov_b64 s[20:21], -1
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/flat-scratch.ll

	Show All 16 Lines
	; GFX9-NEXT: s_mov_b32 s1, s0			; GFX9-NEXT: s_mov_b32 s1, s0
	; GFX9-NEXT: s_mov_b32 s2, s0			; GFX9-NEXT: s_mov_b32 s2, s0
	; GFX9-NEXT: s_mov_b32 s3, s0			; GFX9-NEXT: s_mov_b32 s3, s0
	; GFX9-NEXT: v_mov_b32_e32 v0, s0			; GFX9-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, s1			; GFX9-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-NEXT: v_mov_b32_e32 v3, s3			; GFX9-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-NEXT: s_mov_b32 s1, 0			; GFX9-NEXT: s_mov_b32 s1, 0
	; GFX9-NEXT: s_mov_b32 s0, 0
	; GFX9-NEXT: s_mov_b32 vcc_lo, 0			; GFX9-NEXT: s_mov_b32 vcc_lo, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:52			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:52
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:36			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:36
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:20			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:20
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:4			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:4
	; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: zero_init_kernel:			; GFX10-LABEL: zero_init_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: s_mov_b32 s0, 0			; GFX10-NEXT: s_mov_b32 s0, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 4
	; GFX10-NEXT: s_mov_b32 s1, s0			; GFX10-NEXT: s_mov_b32 s1, s0
	; GFX10-NEXT: s_mov_b32 s2, s0			; GFX10-NEXT: s_mov_b32 s2, s0
	; GFX10-NEXT: s_mov_b32 s3, s0			; GFX10-NEXT: s_mov_b32 s3, s0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:52			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:52
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:36			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:36
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:20			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:20
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:4			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:4
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v4
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: zero_init_kernel:			; GFX11-LABEL: zero_init_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
	; GFX11-NEXT: v_mov_b32_e32 v4, 4			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:52			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:52
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:36			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:36
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:20			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:20
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:4			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v4
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: zero_init_kernel:			; GFX9-PAL-LABEL: zero_init_kernel:
	; GFX9-PAL: ; %bb.0:			; GFX9-PAL: ; %bb.0:
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 s0, 0			; GFX9-PAL-NEXT: s_mov_b32 s0, 0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: s_mov_b32 s1, s0			; GFX9-PAL-NEXT: s_mov_b32 s1, s0
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_mov_b32 s3, s0			; GFX9-PAL-NEXT: s_mov_b32 s3, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-PAL-NEXT: s_mov_b32 s1, 0			; GFX9-PAL-NEXT: s_mov_b32 s1, 0
	; GFX9-PAL-NEXT: s_mov_b32 s0, 0
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:52			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:52
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:36			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:36
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:20			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:20
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:4			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:4
	; GFX9-PAL-NEXT: s_nop 0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: zero_init_kernel:			; GFX940-LABEL: zero_init_kernel:
	; GFX940: ; %bb.0:			; GFX940: ; %bb.0:
	; GFX940-NEXT: s_mov_b32 s0, 0			; GFX940-NEXT: s_mov_b32 s0, 0
	; GFX940-NEXT: s_mov_b32 s1, s0			; GFX940-NEXT: s_mov_b32 s1, s0
	; GFX940-NEXT: s_mov_b32 s2, s0			; GFX940-NEXT: s_mov_b32 s2, s0
	; GFX940-NEXT: s_mov_b32 s3, s0			; GFX940-NEXT: s_mov_b32 s3, s0
	; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]			; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
	; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]			; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:52			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:52
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:36			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:36
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:20			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:20
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:4			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:4
	; GFX940-NEXT: s_nop 1
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: zero_init_kernel:			; GFX1010-PAL-LABEL: zero_init_kernel:
	; GFX1010-PAL: ; %bb.0:			; GFX1010-PAL: ; %bb.0:
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1010-PAL-NEXT: s_mov_b32 s0, 0			; GFX1010-PAL-NEXT: s_mov_b32 s0, 0
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: s_mov_b32 s1, s0			; GFX1010-PAL-NEXT: s_mov_b32 s1, s0
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_mov_b32 s3, s0			; GFX1010-PAL-NEXT: s_mov_b32 s3, s0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX1010-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX1010-PAL-NEXT: s_mov_b32 s2, 0			; GFX1010-PAL-NEXT: s_mov_b32 s2, 0
	; GFX1010-PAL-NEXT: s_mov_b32 s1, 0			; GFX1010-PAL-NEXT: s_mov_b32 s1, 0
	; GFX1010-PAL-NEXT: s_mov_b32 s0, 0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v4, 4
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2 offset:52			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2 offset:52
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:36			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:36
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:20			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:20
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:4			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:4
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v4
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: zero_init_kernel:			; GFX1030-PAL-LABEL: zero_init_kernel:
	; GFX1030-PAL: ; %bb.0:			; GFX1030-PAL: ; %bb.0:
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1030-PAL-NEXT: s_mov_b32 s0, 0			; GFX1030-PAL-NEXT: s_mov_b32 s0, 0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v4, 4
	; GFX1030-PAL-NEXT: s_mov_b32 s1, s0			; GFX1030-PAL-NEXT: s_mov_b32 s1, s0
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_mov_b32 s3, s0			; GFX1030-PAL-NEXT: s_mov_b32 s3, s0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX1030-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:52			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:52
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:36			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:36
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:20			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:20
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:4			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:4
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v4
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: zero_init_kernel:			; GFX11-PAL-LABEL: zero_init_kernel:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v4, 4			; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:52			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:52
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:36			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:36
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:20			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:20
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:4			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v4
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-PAL-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	%alloca = alloca [32 x i16], align 2, addrspace(5)			%alloca = alloca [32 x i16], align 2, addrspace(5)
	%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*			%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
	call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)			call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
	call void asm sideeffect "; use $0", "s"([32 x i16] addrspace(5)* %alloca) #0
	ret void			ret void
	}			}

	define void @zero_init_foo() {			define void @zero_init_foo() {
	; GFX9-LABEL: zero_init_foo:			; GFX9-LABEL: zero_init_foo:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s0, 0			; GFX9-NEXT: s_mov_b32 s0, 0
	; GFX9-NEXT: s_mov_b32 s1, s0			; GFX9-NEXT: s_mov_b32 s1, s0
	; GFX9-NEXT: s_mov_b32 s2, s0			; GFX9-NEXT: s_mov_b32 s2, s0
	; GFX9-NEXT: s_mov_b32 s3, s0			; GFX9-NEXT: s_mov_b32 s3, s0
	; GFX9-NEXT: v_mov_b32_e32 v0, s0			; GFX9-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, s1			; GFX9-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-NEXT: v_mov_b32_e32 v3, s3			; GFX9-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_e32 v0, s32
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: zero_init_foo:			; GFX10-LABEL: zero_init_foo:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_mov_b32 s0, 0			; GFX10-NEXT: s_mov_b32 s0, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, s32
	; GFX10-NEXT: s_mov_b32 s1, s0			; GFX10-NEXT: s_mov_b32 s1, s0
	; GFX10-NEXT: s_mov_b32 s2, s0			; GFX10-NEXT: s_mov_b32 s2, s0
	; GFX10-NEXT: s_mov_b32 s3, s0			; GFX10-NEXT: s_mov_b32 s3, s0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v4
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: zero_init_foo:			; GFX11-LABEL: zero_init_foo:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
	; GFX11-NEXT: v_mov_b32_e32 v4, s32			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:48			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:48
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:32
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v4
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: zero_init_foo:			; GFX9-PAL-LABEL: zero_init_foo:
	; GFX9-PAL: ; %bb.0:			; GFX9-PAL: ; %bb.0:
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: s_mov_b32 s0, 0			; GFX9-PAL-NEXT: s_mov_b32 s0, 0
	; GFX9-PAL-NEXT: s_mov_b32 s1, s0			; GFX9-PAL-NEXT: s_mov_b32 s1, s0
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_mov_b32 s3, s0			; GFX9-PAL-NEXT: s_mov_b32 s3, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX9-PAL-NEXT: s_nop 0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s32
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX940-LABEL: zero_init_foo:			; GFX940-LABEL: zero_init_foo:
	; GFX940: ; %bb.0:			; GFX940: ; %bb.0:
	; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX940-NEXT: s_mov_b32 s0, 0			; GFX940-NEXT: s_mov_b32 s0, 0
	; GFX940-NEXT: s_mov_b32 s1, s0			; GFX940-NEXT: s_mov_b32 s1, s0
	; GFX940-NEXT: s_mov_b32 s2, s0			; GFX940-NEXT: s_mov_b32 s2, s0
	; GFX940-NEXT: s_mov_b32 s3, s0			; GFX940-NEXT: s_mov_b32 s3, s0
	; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]			; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
	; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]			; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX940-NEXT: s_nop 1
	; GFX940-NEXT: v_mov_b32_e32 v0, s32
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_setpc_b64 s[30:31]			; GFX940-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: zero_init_foo:			; GFX10-PAL-LABEL: zero_init_foo:
	; GFX10-PAL: ; %bb.0:			; GFX10-PAL: ; %bb.0:
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: s_mov_b32 s0, 0			; GFX10-PAL-NEXT: s_mov_b32 s0, 0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v4, s32
	; GFX10-PAL-NEXT: s_mov_b32 s1, s0			; GFX10-PAL-NEXT: s_mov_b32 s1, s0
	; GFX10-PAL-NEXT: s_mov_b32 s2, s0			; GFX10-PAL-NEXT: s_mov_b32 s2, s0
	; GFX10-PAL-NEXT: s_mov_b32 s3, s0			; GFX10-PAL-NEXT: s_mov_b32 s3, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX10-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v4
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-PAL-LABEL: zero_init_foo:			; GFX11-PAL-LABEL: zero_init_foo:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v4, s32			; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:48			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:48
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:32			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:32
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:16
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v4
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: zero_init_foo:
				; GCN: ; %bb.0:
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: s_mov_b32 s0, 0
				; GCN-NEXT: s_mov_b32 s1, s0
				; GCN-NEXT: s_mov_b32 s2, s0
				; GCN-NEXT: s_mov_b32 s3, s0
				; GCN-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
				; GCN-NEXT: v_mov_b64_e32 v[2:3], s[2:3]
				; GCN-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:48
				; GCN-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:32
				; GCN-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
				; GCN-NEXT: scratch_store_dwordx4 off, v[0:3], s32
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca [32 x i16], align 2, addrspace(5)			%alloca = alloca [32 x i16], align 2, addrspace(5)
	%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*			%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
	call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)			call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
	call void asm sideeffect "; use $0", "s"([32 x i16] addrspace(5)* %alloca) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_load_sindex_kernel(i32 %idx) {			define amdgpu_kernel void @store_load_sindex_kernel(i32 %idx) {
	; GFX9-LABEL: store_load_sindex_kernel:			; GFX9-LABEL: store_load_sindex_kernel:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: s_add_i32 s1, s1, 4			; GFX9-NEXT: s_add_i32 s1, s1, 4
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_i32 s0, s0, 4			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_kernel:			; GFX10-LABEL: store_load_sindex_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_add_i32 s0, s0, 4			; GFX10-NEXT: s_add_i32 s0, s0, 4
	; GFX10-NEXT: s_add_i32 s1, s1, 4			; GFX10-NEXT: s_add_i32 s1, s1, 4
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_sindex_kernel:			; GFX11-LABEL: store_load_sindex_kernel:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v0, 15			; GFX11-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_and_b32 s1, s0, 15			; GFX11-NEXT: s_and_b32 s1, s0, 15
	; GFX11-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-NEXT: s_add_i32 s0, s0, 4			; GFX11-NEXT: s_add_i32 s0, s0, 4
	; GFX11-NEXT: s_add_i32 s1, s1, 4			; GFX11-NEXT: s_add_i32 s1, s1, 4
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_kernel:			; GFX9-PAL-LABEL: store_load_sindex_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX9-PAL-NEXT: s_mov_b32 s4, s0			; GFX9-PAL-NEXT: s_mov_b32 s4, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: s_add_i32 s1, s1, 4			; GFX9-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_sindex_kernel:			; GFX940-LABEL: store_load_sindex_kernel:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX940-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX940-NEXT: v_mov_b32_e32 v0, 15			; GFX940-NEXT: v_mov_b32_e32 v0, 15
	; GFX940-NEXT: s_waitcnt lgkmcnt(0)			; GFX940-NEXT: s_waitcnt lgkmcnt(0)
	; GFX940-NEXT: s_lshl_b32 s1, s0, 2			; GFX940-NEXT: s_lshl_b32 s1, s0, 2
	; GFX940-NEXT: s_and_b32 s0, s0, 15			; GFX940-NEXT: s_and_b32 s0, s0, 15
	; GFX940-NEXT: s_add_i32 s1, s1, 4			; GFX940-NEXT: s_add_i32 s1, s1, 4
	; GFX940-NEXT: s_lshl_b32 s0, s0, 2			; GFX940-NEXT: s_lshl_b32 s0, s0, 2
	; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1			; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_add_i32 s0, s0, 4			; GFX940-NEXT: s_add_i32 s0, s0, 4
	; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX10-PAL-LABEL: store_load_sindex_kernel:			; GFX10-PAL-LABEL: store_load_sindex_kernel:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX10-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX10-PAL-NEXT: s_mov_b32 s4, s0			; GFX10-PAL-NEXT: s_mov_b32 s4, s0
	; GFX10-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)
	Show All 9 Lines
	; GFX10-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX10-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX10-PAL-NEXT: s_add_i32 s1, s1, 4			; GFX10-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX10-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX10-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v0
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_endpgm			; GFX10-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_sindex_kernel:			; GFX11-PAL-LABEL: store_load_sindex_kernel:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_load_b32 s0, s[0:1], 0x0			; GFX11-PAL-NEXT: s_load_b32 s0, s[0:1], 0x0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX11-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX11-PAL-NEXT: s_add_i32 s1, s1, 4			; GFX11-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
				; GCN-LABEL: store_load_sindex_kernel:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_load_dword s0, s[0:1], 0x24
				; GCN-NEXT: v_mov_b32_e32 v0, 15
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: s_lshl_b32 s1, s0, 2
				; GCN-NEXT: s_and_b32 s0, s0, 15
				; GCN-NEXT: s_lshl_b32 s0, s0, 2
				; GCN-NEXT: s_add_u32 s1, 4, s1
				; GCN-NEXT: scratch_store_dword off, v0, s1 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_add_u32 s0, 4, s0
				; GCN-NEXT: scratch_load_dword v0, off, s0 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
	bb:			bb:
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_ps void @store_load_sindex_foo(i32 inreg %idx) {			define amdgpu_ps void @store_load_sindex_foo(i32 inreg %idx) {
	; GFX9-LABEL: store_load_sindex_foo:			; GFX9-LABEL: store_load_sindex_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_lshl_b32 s0, s2, 2			; GFX9-NEXT: s_lshl_b32 s0, s2, 2
	; GFX9-NEXT: s_add_i32 s0, s0, 4			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0			; GFX9-NEXT: scratch_store_dword off, v0, s0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_and_b32 s0, s2, 15			; GFX9-NEXT: s_and_b32 s0, s2, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: s_add_i32 s0, s0, 4			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_foo:			; GFX10-LABEL: store_load_sindex_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_and_b32 s0, s2, 15			; GFX10-NEXT: s_and_b32 s0, s2, 15
	; GFX10-NEXT: s_lshl_b32 s1, s2, 2			; GFX10-NEXT: s_lshl_b32 s1, s2, 2
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_add_i32 s1, s1, 4			; GFX10-NEXT: s_add_i32 s1, s1, 4
	; GFX10-NEXT: s_add_i32 s0, s0, 4			; GFX10-NEXT: s_add_i32 s0, s0, 4
	; GFX10-NEXT: scratch_store_dword off, v0, s1			; GFX10-NEXT: scratch_store_dword off, v0, s1
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_sindex_foo:			; GFX11-LABEL: store_load_sindex_foo:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: v_mov_b32_e32 v0, 15			; GFX11-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-NEXT: s_and_b32 s1, s0, 15			; GFX11-NEXT: s_and_b32 s1, s0, 15
	; GFX11-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-NEXT: s_add_i32 s0, s0, 4			; GFX11-NEXT: s_add_i32 s0, s0, 4
	; GFX11-NEXT: s_add_i32 s1, s1, 4			; GFX11-NEXT: s_add_i32 s1, s1, 4
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_foo:			; GFX9-PAL-LABEL: store_load_sindex_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: s_add_i32 s1, s1, 4			; GFX9-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_sindex_foo:			; GFX940-LABEL: store_load_sindex_foo:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_lshl_b32 s1, s0, 2			; GFX940-NEXT: s_lshl_b32 s1, s0, 2
	; GFX940-NEXT: s_and_b32 s0, s0, 15			; GFX940-NEXT: s_and_b32 s0, s0, 15
	; GFX940-NEXT: s_add_i32 s1, s1, 4			; GFX940-NEXT: s_add_i32 s1, s1, 4
	; GFX940-NEXT: v_mov_b32_e32 v0, 15			; GFX940-NEXT: v_mov_b32_e32 v0, 15
	; GFX940-NEXT: s_lshl_b32 s0, s0, 2			; GFX940-NEXT: s_lshl_b32 s0, s0, 2
	; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1			; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_add_i32 s0, s0, 4			; GFX940-NEXT: s_add_i32 s0, s0, 4
	; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX10-PAL-LABEL: store_load_sindex_foo:			; GFX10-PAL-LABEL: store_load_sindex_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX10-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX10-PAL-NEXT: s_mov_b32 s2, s0			; GFX10-PAL-NEXT: s_mov_b32 s2, s0
	; GFX10-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX10-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX10-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX10-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX10-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX10-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX10-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX10-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX10-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX10-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX10-PAL-NEXT: s_add_i32 s1, s1, 4			; GFX10-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX10-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX10-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v0
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_endpgm			; GFX10-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_sindex_foo:			; GFX11-PAL-LABEL: store_load_sindex_foo:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX11-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX11-PAL-NEXT: s_add_i32 s1, s1, 4			; GFX11-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
				; GCN-LABEL: store_load_sindex_foo:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_lshl_b32 s1, s0, 2
				; GCN-NEXT: s_and_b32 s0, s0, 15
				; GCN-NEXT: s_lshl_b32 s0, s0, 2
				; GCN-NEXT: s_add_u32 s1, 4, s1
				; GCN-NEXT: v_mov_b32_e32 v0, 15
				; GCN-NEXT: scratch_store_dword off, v0, s1 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_add_u32 s0, 4, s0
				; GCN-NEXT: scratch_load_dword v0, off, s0 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
	bb:			bb:
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_load_vindex_kernel() {			define amdgpu_kernel void @store_load_vindex_kernel() {
	; GFX9-LABEL: store_load_vindex_kernel:			; GFX9-LABEL: store_load_vindex_kernel:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: v_add_u32_e32 v1, 4, v0			; GFX9-NEXT: v_add_u32_e32 v1, 4, v0
	; GFX9-NEXT: v_mov_b32_e32 v2, 15			; GFX9-NEXT: v_mov_b32_e32 v2, 15
	; GFX9-NEXT: scratch_store_dword v1, v2, off			; GFX9-NEXT: scratch_store_dword v1, v2, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_sub_u32_e32 v0, 4, v0			; GFX9-NEXT: v_sub_u32_e32 v0, 4, v0
	; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124 glc			; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_vindex_kernel:			; GFX10-LABEL: store_load_vindex_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: v_mov_b32_e32 v2, 15			; GFX10-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-NEXT: v_add_nc_u32_e32 v1, 4, v0			; GFX10-NEXT: v_add_nc_u32_e32 v1, 4, v0
	; GFX10-NEXT: v_sub_nc_u32_e32 v0, 4, v0			; GFX10-NEXT: v_sub_nc_u32_e32 v0, 4, v0
	; GFX10-NEXT: scratch_store_dword v1, v2, off			; GFX10-NEXT: scratch_store_dword v1, v2, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_vindex_kernel:			; GFX11-LABEL: store_load_vindex_kernel:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_sub_nc_u32_e32 v2, 4, v0			; GFX11-NEXT: v_sub_nc_u32_e32 v2, 4, v0
	; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:4 dlc			; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:4 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_kernel:			; GFX9-PAL-LABEL: store_load_vindex_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-PAL-NEXT: v_add_u32_e32 v1, 4, v0			; GFX9-PAL-NEXT: v_add_u32_e32 v1, 4, v0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX9-PAL-NEXT: v_sub_u32_e32 v0, 4, v0			; GFX9-PAL-NEXT: v_sub_u32_e32 v0, 4, v0
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: scratch_store_dword v1, v2, off			; GFX9-PAL-NEXT: scratch_store_dword v1, v2, off
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_vindex_kernel:			; GFX940-LABEL: store_load_vindex_kernel:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX940-NEXT: v_mov_b32_e32 v1, 15			; GFX940-NEXT: v_mov_b32_e32 v1, 15
	; GFX940-NEXT: scratch_store_dword v0, v1, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_store_dword v0, v1, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_sub_u32_e32 v0, 4, v0			; GFX940-NEXT: v_sub_u32_e32 v0, 4, v0
	; GFX940-NEXT: scratch_load_dword v0, v0, off offset:124 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, v0, off offset:124 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX10-PAL-LABEL: store_load_vindex_kernel:			; GFX10-PAL-LABEL: store_load_vindex_kernel:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX10-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX10-PAL-NEXT: s_mov_b32 s2, s0			; GFX10-PAL-NEXT: s_mov_b32 s2, s0
	; GFX10-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX10-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX10-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX10-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX10-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX10-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX10-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-PAL-NEXT: v_add_nc_u32_e32 v1, 4, v0			; GFX10-PAL-NEXT: v_add_nc_u32_e32 v1, 4, v0
	; GFX10-PAL-NEXT: v_sub_nc_u32_e32 v0, 4, v0			; GFX10-PAL-NEXT: v_sub_nc_u32_e32 v0, 4, v0
	; GFX10-PAL-NEXT: scratch_store_dword v1, v2, off			; GFX10-PAL-NEXT: scratch_store_dword v1, v2, off
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v0
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_endpgm			; GFX10-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_vindex_kernel:			; GFX11-PAL-LABEL: store_load_vindex_kernel:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-PAL-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-PAL-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-PAL-NEXT: v_sub_nc_u32_e32 v2, 4, v0			; GFX11-PAL-NEXT: v_sub_nc_u32_e32 v2, 4, v0
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:4 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:4 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
				; GCN-LABEL: store_load_vindex_kernel:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GCN-NEXT: v_mov_b32_e32 v1, 15
				; GCN-NEXT: scratch_store_dword v0, v1, off offset:4 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_sub_u32_e32 v0, 4, v0
				; GCN-NEXT: scratch_load_dword v0, v0, off offset:124 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
	bb:			bb:
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()			%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()
	%i3 = zext i32 %i2 to i64			%i3 = zext i32 %i2 to i64
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = sub nsw i32 31, %i2			%i9 = sub nsw i32 31, %i2
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define void @store_load_vindex_foo(i32 %idx) {			define void @store_load_vindex_foo(i32 %idx) {
	; GFX9-LABEL: store_load_vindex_foo:			; GFX9-LABEL: store_load_vindex_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v1, s32			; GFX9-NEXT: v_mov_b32_e32 v1, s32
	; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-NEXT: v_mov_b32_e32 v3, 15			; GFX9-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-NEXT: v_and_b32_e32 v0, 15, v0			; GFX9-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX9-NEXT: scratch_store_dword v2, v3, off			; GFX9-NEXT: scratch_store_dword v2, v3, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v1
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_vindex_foo:			; GFX10-LABEL: store_load_vindex_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_and_b32_e32 v1, 15, v0			; GFX10-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, s32			; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, s32
	; GFX10-NEXT: v_mov_b32_e32 v2, 15			; GFX10-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-NEXT: v_lshl_add_u32 v1, v1, 2, s32			; GFX10-NEXT: v_lshl_add_u32 v1, v1, 2, s32
	; GFX10-NEXT: scratch_store_dword v0, v2, off			; GFX10-NEXT: scratch_store_dword v0, v2, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v1, off glc dlc			; GFX10-NEXT: scratch_load_dword v0, v1, off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s32
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: store_load_vindex_foo:			; GFX11-LABEL: store_load_vindex_foo:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0			; GFX11-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX11-NEXT: scratch_store_b32 v0, v2, s32 dlc			; GFX11-NEXT: scratch_store_b32 v0, v2, s32 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v1, s32 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v1, s32 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, s32
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_foo:			; GFX9-PAL-LABEL: store_load_vindex_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s32			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s32
	; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-PAL-NEXT: v_and_b32_e32 v0, 15, v0			; GFX9-PAL-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off			; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v1
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX940-LABEL: store_load_vindex_foo:			; GFX940-LABEL: store_load_vindex_foo:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX940-NEXT: v_lshlrev_b32_e32 v1, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v1, 2, v0
	; GFX940-NEXT: v_mov_b32_e32 v2, 15			; GFX940-NEXT: v_mov_b32_e32 v2, 15
	; GFX940-NEXT: v_and_b32_e32 v0, 15, v0			; GFX940-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX940-NEXT: scratch_store_dword v1, v2, s32 sc0 sc1			; GFX940-NEXT: scratch_store_dword v1, v2, s32 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX940-NEXT: scratch_load_dword v0, v0, s32 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, v0, s32 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, s32
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_setpc_b64 s[30:31]			; GFX940-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: store_load_vindex_foo:			; GFX10-PAL-LABEL: store_load_vindex_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: v_and_b32_e32 v1, 15, v0			; GFX10-PAL-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, s32			; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, s32
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-PAL-NEXT: v_lshl_add_u32 v1, v1, 2, s32			; GFX10-PAL-NEXT: v_lshl_add_u32 v1, v1, 2, s32
	; GFX10-PAL-NEXT: scratch_store_dword v0, v2, off			; GFX10-PAL-NEXT: scratch_store_dword v0, v2, off
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, v1, off glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, v1, off glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s32
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v0
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-PAL-LABEL: store_load_vindex_foo:			; GFX11-PAL-LABEL: store_load_vindex_foo:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-PAL-NEXT: s_delay_alu instid0(VALU_DEP_2)			; GFX11-PAL-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v2, s32 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v2, s32 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s32 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s32 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s32
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: store_load_vindex_foo:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, 15
				; GCN-NEXT: v_lshlrev_b32_e32 v1, 2, v0
				; GCN-NEXT: v_and_b32_e32 v0, v0, v2
				; GCN-NEXT: scratch_store_dword v1, v2, s32 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GCN-NEXT: scratch_load_dword v0, v0, s32 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define void @private_ptr_foo(float addrspace(5)* nocapture %arg) {			define void @private_ptr_foo(float addrspace(5)* nocapture %arg) {
	; GFX9-LABEL: private_ptr_foo:			; GFX9-LABEL: private_ptr_foo:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x41200000			; GFX9-NEXT: v_mov_b32_e32 v1, 0x41200000
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GFX11-PAL-LABEL: private_ptr_foo:			; GFX11-PAL-LABEL: private_ptr_foo:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, 0x41200000			; GFX11-PAL-NEXT: v_mov_b32_e32 v1, 0x41200000
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:4			; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:4
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: private_ptr_foo:
				; GCN: ; %bb.0:
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v1, 0x41200000
				; GCN-NEXT: scratch_store_dword v0, v1, off offset:4
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	%gep = getelementptr inbounds float, float addrspace(5)* %arg, i32 1			%gep = getelementptr inbounds float, float addrspace(5)* %arg, i32 1
	store float 1.000000e+01, float addrspace(5)* %gep, align 4			store float 1.000000e+01, float addrspace(5)* %gep, align 4
	ret void			ret void
	}			}

	define amdgpu_kernel void @zero_init_small_offset_kernel() {			define amdgpu_kernel void @zero_init_small_offset_kernel() {
	; GFX9-LABEL: zero_init_small_offset_kernel:			; GFX9-LABEL: zero_init_small_offset_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: scratch_load_dword v0, off, s4 offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, s4 offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_mov_b32 s0, 0			; GFX9-NEXT: s_mov_b32 s0, 0
	; GFX9-NEXT: s_mov_b32 s1, s0			; GFX9-NEXT: s_mov_b32 s1, s0
	; GFX9-NEXT: s_mov_b32 s2, s0			; GFX9-NEXT: s_mov_b32 s2, s0
	; GFX9-NEXT: s_mov_b32 s3, s0			; GFX9-NEXT: s_mov_b32 s3, s0
	; GFX9-NEXT: v_mov_b32_e32 v0, s0			; GFX9-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, s1			; GFX9-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-NEXT: v_mov_b32_e32 v3, s3			; GFX9-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-NEXT: s_mov_b32 s1, 0			; GFX9-NEXT: s_mov_b32 s1, 0
	; GFX9-NEXT: s_mov_b32 s0, 0
	; GFX9-NEXT: s_mov_b32 vcc_lo, 0			; GFX9-NEXT: s_mov_b32 vcc_lo, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:260			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:260
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:276			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:276
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:292			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:292
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:308			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:308
	; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: zero_init_small_offset_kernel:			; GFX10-LABEL: zero_init_small_offset_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_mov_b32 s0, 0			; GFX10-NEXT: s_mov_b32 s0, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 4
	; GFX10-NEXT: s_mov_b32 s1, s0			; GFX10-NEXT: s_mov_b32 s1, s0
	; GFX10-NEXT: s_mov_b32 s2, s0			; GFX10-NEXT: s_mov_b32 s2, s0
	; GFX10-NEXT: s_mov_b32 s3, s0			; GFX10-NEXT: s_mov_b32 s3, s0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: v_mov_b32_e32 v5, 0x104
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:260			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:260
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:276			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:276
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:292			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:292
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:308			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:308
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v4
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v5
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: zero_init_small_offset_kernel:			; GFX11-LABEL: zero_init_small_offset_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
	; GFX11-NEXT: v_dual_mov_b32 v4, 4 :: v_dual_mov_b32 v5, 0x104			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:260			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:260
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:276			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:276
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:292			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:292
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:308			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:308
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v4
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v5
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: zero_init_small_offset_kernel:			; GFX9-PAL-LABEL: zero_init_small_offset_kernel:
	; GFX9-PAL: ; %bb.0:			; GFX9-PAL: ; %bb.0:
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	Show All 10 Lines
	; GFX9-PAL-NEXT: s_mov_b32 s1, s0			; GFX9-PAL-NEXT: s_mov_b32 s1, s0
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_mov_b32 s3, s0			; GFX9-PAL-NEXT: s_mov_b32 s3, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-PAL-NEXT: s_mov_b32 s1, 0			; GFX9-PAL-NEXT: s_mov_b32 s1, 0
	; GFX9-PAL-NEXT: s_mov_b32 s0, 0
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:260			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:260
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:276			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:276
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:292			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:292
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:308			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:308
	; GFX9-PAL-NEXT: s_nop 0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: zero_init_small_offset_kernel:			; GFX940-LABEL: zero_init_small_offset_kernel:
	; GFX940: ; %bb.0:			; GFX940: ; %bb.0:
	; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_mov_b32 s0, 0			; GFX940-NEXT: s_mov_b32 s0, 0
	; GFX940-NEXT: s_mov_b32 s1, s0			; GFX940-NEXT: s_mov_b32 s1, s0
	; GFX940-NEXT: s_mov_b32 s2, s0			; GFX940-NEXT: s_mov_b32 s2, s0
	; GFX940-NEXT: s_mov_b32 s3, s0			; GFX940-NEXT: s_mov_b32 s3, s0
	; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]			; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
	; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]			; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:260			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:260
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:276			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:276
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:292			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:292
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:308			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:308
	; GFX940-NEXT: s_nop 1
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: zero_init_small_offset_kernel:			; GFX1010-PAL-LABEL: zero_init_small_offset_kernel:
	; GFX1010-PAL: ; %bb.0:			; GFX1010-PAL: ; %bb.0:
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_mov_b32 s3, s0			; GFX1010-PAL-NEXT: s_mov_b32 s3, s0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX1010-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX1010-PAL-NEXT: s_mov_b32 s2, 0			; GFX1010-PAL-NEXT: s_mov_b32 s2, 0
	; GFX1010-PAL-NEXT: s_mov_b32 s1, 0			; GFX1010-PAL-NEXT: s_mov_b32 s1, 0
	; GFX1010-PAL-NEXT: s_mov_b32 s0, 0
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2 offset:260			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2 offset:260
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v4, 4
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v5, 0x104
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:276			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:276
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:292			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:292
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:308			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:308
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v4
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v5
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: zero_init_small_offset_kernel:			; GFX1030-PAL-LABEL: zero_init_small_offset_kernel:
	; GFX1030-PAL: ; %bb.0:			; GFX1030-PAL: ; %bb.0:
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_mov_b32 s0, 0			; GFX1030-PAL-NEXT: s_mov_b32 s0, 0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v4, 4
	; GFX1030-PAL-NEXT: s_mov_b32 s1, s0			; GFX1030-PAL-NEXT: s_mov_b32 s1, s0
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_mov_b32 s3, s0			; GFX1030-PAL-NEXT: s_mov_b32 s3, s0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX1030-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v5, 0x104
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:260			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:260
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:276			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:276
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:292			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:292
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:308			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:308
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v4
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v5
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: zero_init_small_offset_kernel:			; GFX11-PAL-LABEL: zero_init_small_offset_kernel:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v4, 4 :: v_dual_mov_b32 v5, 0x104			; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:260			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:260
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:276			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:276
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:292			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:292
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:308			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:308
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v4
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v5
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-PAL-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%alloca = alloca [32 x i16], align 2, addrspace(5)			%alloca = alloca [32 x i16], align 2, addrspace(5)
	%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*			%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
	call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)			call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
	call void asm sideeffect "; use $0", "s"([64 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x i16] addrspace(5)* %alloca) #0
	ret void			ret void
	}			}

	define void @zero_init_small_offset_foo() {			define void @zero_init_small_offset_foo() {
	; GFX9-LABEL: zero_init_small_offset_foo:			; GFX9-LABEL: zero_init_small_offset_foo:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, off, s32 glc			; GFX9-NEXT: scratch_load_dword v0, off, s32 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_mov_b32 s0, 0			; GFX9-NEXT: s_mov_b32 s0, 0
	; GFX9-NEXT: s_mov_b32 s1, s0			; GFX9-NEXT: s_mov_b32 s1, s0
	; GFX9-NEXT: s_mov_b32 s2, s0			; GFX9-NEXT: s_mov_b32 s2, s0
	; GFX9-NEXT: s_mov_b32 s3, s0			; GFX9-NEXT: s_mov_b32 s3, s0
	; GFX9-NEXT: v_mov_b32_e32 v0, s0			; GFX9-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, s1			; GFX9-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-NEXT: v_mov_b32_e32 v3, s3			; GFX9-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304
	; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX9-NEXT: v_mov_b32_e32 v0, s32
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: zero_init_small_offset_foo:			; GFX10-LABEL: zero_init_small_offset_foo:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s32 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s32 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_mov_b32 s0, 0			; GFX10-NEXT: s_mov_b32 s0, 0
	; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX10-NEXT: s_mov_b32 s1, s0			; GFX10-NEXT: s_mov_b32 s1, s0
	; GFX10-NEXT: s_mov_b32 s2, s0			; GFX10-NEXT: s_mov_b32 s2, s0
	; GFX10-NEXT: s_mov_b32 s3, s0			; GFX10-NEXT: s_mov_b32 s3, s0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: v_mov_b32_e32 v4, s32
	; GFX10-NEXT: v_mov_b32_e32 v5, vcc_lo
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v4
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v5
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: zero_init_small_offset_foo:			; GFX11-LABEL: zero_init_small_offset_foo:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s32 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s32 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
	; GFX11-NEXT: s_add_i32 vcc_lo, s32, 0x100			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-NEXT: v_dual_mov_b32 v4, s32 :: v_dual_mov_b32 v5, vcc_lo
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:256			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:256
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:272			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:272
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:288			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:288
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:304			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s32 offset:304
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v4
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v5
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: zero_init_small_offset_foo:			; GFX9-PAL-LABEL: zero_init_small_offset_foo:
	; GFX9-PAL: ; %bb.0:			; GFX9-PAL: ; %bb.0:
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s32 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s32 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_mov_b32 s0, 0			; GFX9-PAL-NEXT: s_mov_b32 s0, 0
	; GFX9-PAL-NEXT: s_mov_b32 s1, s0			; GFX9-PAL-NEXT: s_mov_b32 s1, s0
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_mov_b32 s3, s0			; GFX9-PAL-NEXT: s_mov_b32 s3, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304
	; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s32
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX940-LABEL: zero_init_small_offset_foo:			; GFX940-LABEL: zero_init_small_offset_foo:
	; GFX940: ; %bb.0:			; GFX940: ; %bb.0:
	; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX940-NEXT: scratch_load_dword v0, off, s32 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, s32 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_mov_b32 s0, 0			; GFX940-NEXT: s_mov_b32 s0, 0
	; GFX940-NEXT: s_mov_b32 s1, s0			; GFX940-NEXT: s_mov_b32 s1, s0
	; GFX940-NEXT: s_mov_b32 s2, s0			; GFX940-NEXT: s_mov_b32 s2, s0
	; GFX940-NEXT: s_mov_b32 s3, s0			; GFX940-NEXT: s_mov_b32 s3, s0
	; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]			; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
	; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]			; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304
	; GFX940-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX940-NEXT: s_nop 0
	; GFX940-NEXT: v_mov_b32_e32 v0, s32
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_setpc_b64 s[30:31]			; GFX940-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: zero_init_small_offset_foo:			; GFX10-PAL-LABEL: zero_init_small_offset_foo:
	; GFX10-PAL: ; %bb.0:			; GFX10-PAL: ; %bb.0:
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, off, s32 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, off, s32 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_mov_b32 s0, 0			; GFX10-PAL-NEXT: s_mov_b32 s0, 0
	; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX10-PAL-NEXT: s_mov_b32 s1, s0			; GFX10-PAL-NEXT: s_mov_b32 s1, s0
	; GFX10-PAL-NEXT: s_mov_b32 s2, s0			; GFX10-PAL-NEXT: s_mov_b32 s2, s0
	; GFX10-PAL-NEXT: s_mov_b32 s3, s0			; GFX10-PAL-NEXT: s_mov_b32 s3, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX10-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-PAL-NEXT: v_mov_b32_e32 v4, s32
	; GFX10-PAL-NEXT: v_mov_b32_e32 v5, vcc_lo
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v4
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v5
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-PAL-LABEL: zero_init_small_offset_foo:			; GFX11-PAL-LABEL: zero_init_small_offset_foo:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s32 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s32 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
	; GFX11-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x100			; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-PAL-NEXT: v_dual_mov_b32 v4, s32 :: v_dual_mov_b32 v5, vcc_lo
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:256			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:256
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:272			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:272
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:288			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:288
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:304			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s32 offset:304
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v4
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v5
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: zero_init_small_offset_foo:
				; GCN: ; %bb.0:
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: scratch_load_dword v0, off, s32 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_mov_b32 s0, 0
				; GCN-NEXT: s_mov_b32 s1, s0
				; GCN-NEXT: s_mov_b32 s2, s0
				; GCN-NEXT: s_mov_b32 s3, s0
				; GCN-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
				; GCN-NEXT: v_mov_b64_e32 v[2:3], s[2:3]
				; GCN-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:256
				; GCN-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:272
				; GCN-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:288
				; GCN-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:304
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%alloca = alloca [32 x i16], align 2, addrspace(5)			%alloca = alloca [32 x i16], align 2, addrspace(5)
	%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*			%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
	call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)			call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
	call void asm sideeffect "; use $0", "s"([64 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x i16] addrspace(5)* %alloca) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_load_sindex_small_offset_kernel(i32 %idx) {			define amdgpu_kernel void @store_load_sindex_small_offset_kernel(i32 %idx) {
	; GFX9-LABEL: store_load_sindex_small_offset_kernel:			; GFX9-LABEL: store_load_sindex_small_offset_kernel:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_addk_i32 s1, 0x104			; GFX9-NEXT: s_addk_i32 s1, 0x104
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_addk_i32 s0, 0x104			; GFX9-NEXT: s_addk_i32 s0, 0x104
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_small_offset_kernel:			; GFX10-LABEL: store_load_sindex_small_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_addk_i32 s0, 0x104			; GFX10-NEXT: s_addk_i32 s0, 0x104
	; GFX10-NEXT: s_addk_i32 s1, 0x104			; GFX10-NEXT: s_addk_i32 s1, 0x104
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_sindex_small_offset_kernel:			; GFX11-LABEL: store_load_sindex_small_offset_kernel:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_dual_mov_b32 v0, 15 :: v_dual_mov_b32 v1, 0x104			; GFX11-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_and_b32 s1, s0, 15			; GFX11-NEXT: s_and_b32 s1, s0, 15
	; GFX11-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-NEXT: s_addk_i32 s0, 0x104			; GFX11-NEXT: s_addk_i32 s0, 0x104
	; GFX11-NEXT: s_addk_i32 s1, 0x104			; GFX11-NEXT: s_addk_i32 s1, 0x104
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v1
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_small_offset_kernel:			; GFX9-PAL-LABEL: store_load_sindex_small_offset_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX9-PAL-NEXT: s_mov_b32 s4, s0			; GFX9-PAL-NEXT: s_mov_b32 s4, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	Show All 9 Lines
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_addk_i32 s1, 0x104			; GFX9-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_addk_i32 s0, 0x104			; GFX9-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_sindex_small_offset_kernel:			; GFX940-LABEL: store_load_sindex_small_offset_kernel:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX940-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 15			; GFX940-NEXT: v_mov_b32_e32 v0, 15
	; GFX940-NEXT: s_waitcnt lgkmcnt(0)			; GFX940-NEXT: s_waitcnt lgkmcnt(0)
	; GFX940-NEXT: s_lshl_b32 s1, s0, 2			; GFX940-NEXT: s_lshl_b32 s1, s0, 2
	; GFX940-NEXT: s_and_b32 s0, s0, 15			; GFX940-NEXT: s_and_b32 s0, s0, 15
	; GFX940-NEXT: s_addk_i32 s1, 0x104			; GFX940-NEXT: s_addk_i32 s1, 0x104
	; GFX940-NEXT: s_lshl_b32 s0, s0, 2			; GFX940-NEXT: s_lshl_b32 s0, s0, 2
	; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1			; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_addk_i32 s0, 0x104			; GFX940-NEXT: s_addk_i32 s0, 0x104
	; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_sindex_small_offset_kernel:			; GFX1010-PAL-LABEL: store_load_sindex_small_offset_kernel:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX1010-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX1010-PAL-NEXT: s_mov_b32 s4, s0			; GFX1010-PAL-NEXT: s_mov_b32 s4, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s4, s4, s3			; GFX1010-PAL-NEXT: s_add_u32 s4, s4, s3
	; GFX1010-PAL-NEXT: s_addc_u32 s5, s5, 0			; GFX1010-PAL-NEXT: s_addc_u32 s5, s5, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
	; GFX1010-PAL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX1010-PAL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x104			; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x104			; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v0
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v1
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_sindex_small_offset_kernel:			; GFX1030-PAL-LABEL: store_load_sindex_small_offset_kernel:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX1030-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX1030-PAL-NEXT: s_mov_b32 s4, s0			; GFX1030-PAL-NEXT: s_mov_b32 s4, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s4, s4, s3			; GFX1030-PAL-NEXT: s_add_u32 s4, s4, s3
	; GFX1030-PAL-NEXT: s_addc_u32 s5, s5, 0			; GFX1030-PAL-NEXT: s_addc_u32 s5, s5, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
	; GFX1030-PAL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX1030-PAL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x104			; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x104			; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v0
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v1
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_sindex_small_offset_kernel:			; GFX11-PAL-LABEL: store_load_sindex_small_offset_kernel:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_load_b32 s0, s[0:1], 0x0			; GFX11-PAL-NEXT: s_load_b32 s0, s[0:1], 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, 15 :: v_dual_mov_b32 v1, 0x104			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-PAL-NEXT: s_addk_i32 s0, 0x104			; GFX11-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX11-PAL-NEXT: s_addk_i32 s1, 0x104			; GFX11-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v1
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([64 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_ps void @store_load_sindex_small_offset_foo(i32 inreg %idx) {			define amdgpu_ps void @store_load_sindex_small_offset_foo(i32 inreg %idx) {
	; GFX9-LABEL: store_load_sindex_small_offset_foo:			; GFX9-LABEL: store_load_sindex_small_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s0, s2, 2			; GFX9-NEXT: s_lshl_b32 s0, s2, 2
	; GFX9-NEXT: s_addk_i32 s0, 0x104			; GFX9-NEXT: s_addk_i32 s0, 0x104
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0			; GFX9-NEXT: scratch_store_dword off, v0, s0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_and_b32 s0, s2, 15			; GFX9-NEXT: s_and_b32 s0, s2, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: s_addk_i32 s0, 0x104			; GFX9-NEXT: s_addk_i32 s0, 0x104
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_small_offset_foo:			; GFX10-LABEL: store_load_sindex_small_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_and_b32 s0, s2, 15			; GFX10-NEXT: s_and_b32 s0, s2, 15
	; GFX10-NEXT: s_lshl_b32 s1, s2, 2			; GFX10-NEXT: s_lshl_b32 s1, s2, 2
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_addk_i32 s1, 0x104			; GFX10-NEXT: s_addk_i32 s1, 0x104
	; GFX10-NEXT: s_addk_i32 s0, 0x104			; GFX10-NEXT: s_addk_i32 s0, 0x104
	; GFX10-NEXT: scratch_store_dword off, v0, s1			; GFX10-NEXT: scratch_store_dword off, v0, s1
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_sindex_small_offset_foo:			; GFX11-LABEL: store_load_sindex_small_offset_foo:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_dual_mov_b32 v0, 15 :: v_dual_mov_b32 v1, 0x104			; GFX11-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-NEXT: s_and_b32 s1, s0, 15			; GFX11-NEXT: s_and_b32 s1, s0, 15
	; GFX11-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-NEXT: s_addk_i32 s0, 0x104			; GFX11-NEXT: s_addk_i32 s0, 0x104
	; GFX11-NEXT: s_addk_i32 s1, 0x104			; GFX11-NEXT: s_addk_i32 s1, 0x104
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v1
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_small_offset_foo:			; GFX9-PAL-LABEL: store_load_sindex_small_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: s_addk_i32 s1, 0x104			; GFX9-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_addk_i32 s0, 0x104			; GFX9-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_sindex_small_offset_foo:			; GFX940-LABEL: store_load_sindex_small_offset_foo:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_lshl_b32 s1, s0, 2			; GFX940-NEXT: s_lshl_b32 s1, s0, 2
	; GFX940-NEXT: s_and_b32 s0, s0, 15			; GFX940-NEXT: s_and_b32 s0, s0, 15
	; GFX940-NEXT: s_addk_i32 s1, 0x104			; GFX940-NEXT: s_addk_i32 s1, 0x104
	; GFX940-NEXT: v_mov_b32_e32 v0, 15			; GFX940-NEXT: v_mov_b32_e32 v0, 15
	; GFX940-NEXT: s_lshl_b32 s0, s0, 2			; GFX940-NEXT: s_lshl_b32 s0, s0, 2
	; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1			; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_addk_i32 s0, 0x104			; GFX940-NEXT: s_addk_i32 s0, 0x104
	; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_sindex_small_offset_foo:			; GFX1010-PAL-LABEL: store_load_sindex_small_offset_foo:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x104			; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x104			; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v0
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v1
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_sindex_small_offset_foo:			; GFX1030-PAL-LABEL: store_load_sindex_small_offset_foo:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	Show All 9 Lines
	; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x104			; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x104			; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v0
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v1
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_sindex_small_offset_foo:			; GFX11-PAL-LABEL: store_load_sindex_small_offset_foo:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, 15 :: v_dual_mov_b32 v1, 0x104			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-PAL-NEXT: s_addk_i32 s0, 0x104			; GFX11-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX11-PAL-NEXT: s_addk_i32 s1, 0x104			; GFX11-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v1
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([64 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_load_vindex_small_offset_kernel() {			define amdgpu_kernel void @store_load_vindex_small_offset_kernel() {
	; GFX9-LABEL: store_load_vindex_small_offset_kernel:			; GFX9-LABEL: store_load_vindex_small_offset_kernel:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v1, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v1, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-NEXT: v_add_u32_e32 v1, 0x104, v0			; GFX9-NEXT: v_add_u32_e32 v1, 0x104, v0
	; GFX9-NEXT: v_mov_b32_e32 v2, 15			; GFX9-NEXT: v_mov_b32_e32 v2, 15
	; GFX9-NEXT: scratch_store_dword v1, v2, off			; GFX9-NEXT: scratch_store_dword v1, v2, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_sub_u32_e32 v0, 0x104, v0			; GFX9-NEXT: v_sub_u32_e32 v0, 0x104, v0
	; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124 glc			; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v1
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_vindex_small_offset_kernel:			; GFX10-LABEL: store_load_vindex_small_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: v_mov_b32_e32 v2, 15			; GFX10-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-NEXT: scratch_load_dword v3, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v3, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_add_nc_u32_e32 v1, 0x104, v0			; GFX10-NEXT: v_add_nc_u32_e32 v1, 0x104, v0
	; GFX10-NEXT: v_sub_nc_u32_e32 v0, 0x104, v0			; GFX10-NEXT: v_sub_nc_u32_e32 v0, 0x104, v0
	; GFX10-NEXT: scratch_store_dword v1, v2, off			; GFX10-NEXT: scratch_store_dword v1, v2, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_vindex_small_offset_kernel:			; GFX11-LABEL: store_load_vindex_small_offset_kernel:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: scratch_load_b32 v3, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v3, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_sub_nc_u32_e32 v2, 0x104, v0			; GFX11-NEXT: v_sub_nc_u32_e32 v2, 0x104, v0
	; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:260 dlc			; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:260 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX11-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v1
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_small_offset_kernel:			; GFX9-PAL-LABEL: store_load_vindex_small_offset_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: scratch_load_dword v1, off, vcc_hi offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v1, off, vcc_hi offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_add_u32_e32 v1, 0x104, v0			; GFX9-PAL-NEXT: v_add_u32_e32 v1, 0x104, v0
	; GFX9-PAL-NEXT: scratch_store_dword v1, v2, off			; GFX9-PAL-NEXT: scratch_store_dword v1, v2, off
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_sub_u32_e32 v0, 0x104, v0			; GFX9-PAL-NEXT: v_sub_u32_e32 v0, 0x104, v0
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v1
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_vindex_small_offset_kernel:			; GFX940-LABEL: store_load_vindex_small_offset_kernel:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: scratch_load_dword v1, off, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v1, off, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX940-NEXT: v_mov_b32_e32 v1, 15			; GFX940-NEXT: v_mov_b32_e32 v1, 15
	; GFX940-NEXT: scratch_store_dword v0, v1, off offset:260 sc0 sc1			; GFX940-NEXT: scratch_store_dword v0, v1, off offset:260 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_sub_u32_e32 v0, 0x104, v0			; GFX940-NEXT: v_sub_u32_e32 v0, 0x104, v0
	; GFX940-NEXT: scratch_load_dword v0, v0, off offset:124 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, v0, off offset:124 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x104
	; GFX940-NEXT: v_mov_b32_e32 v1, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v1
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_vindex_small_offset_kernel:			; GFX1010-PAL-LABEL: store_load_vindex_small_offset_kernel:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1010-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX1010-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: scratch_load_dword v3, off, vcc_lo offset:4 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v3, off, vcc_lo offset:4 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_add_nc_u32_e32 v1, 0x104, v0			; GFX1010-PAL-NEXT: v_add_nc_u32_e32 v1, 0x104, v0
	; GFX1010-PAL-NEXT: v_sub_nc_u32_e32 v0, 0x104, v0			; GFX1010-PAL-NEXT: v_sub_nc_u32_e32 v0, 0x104, v0
	; GFX1010-PAL-NEXT: scratch_store_dword v1, v2, off			; GFX1010-PAL-NEXT: scratch_store_dword v1, v2, off
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v0
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v1
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_vindex_small_offset_kernel:			; GFX1030-PAL-LABEL: store_load_vindex_small_offset_kernel:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1030-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX1030-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX1030-PAL-NEXT: scratch_load_dword v3, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v3, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_add_nc_u32_e32 v1, 0x104, v0			; GFX1030-PAL-NEXT: v_add_nc_u32_e32 v1, 0x104, v0
	; GFX1030-PAL-NEXT: v_sub_nc_u32_e32 v0, 0x104, v0			; GFX1030-PAL-NEXT: v_sub_nc_u32_e32 v0, 0x104, v0
	; GFX1030-PAL-NEXT: scratch_store_dword v1, v2, off			; GFX1030-PAL-NEXT: scratch_store_dword v1, v2, off
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v0
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v1
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_vindex_small_offset_kernel:			; GFX11-PAL-LABEL: store_load_vindex_small_offset_kernel:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-PAL-NEXT: scratch_load_b32 v3, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v3, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_sub_nc_u32_e32 v2, 0x104, v0			; GFX11-PAL-NEXT: v_sub_nc_u32_e32 v2, 0x104, v0
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:260 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:260 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, 0x104
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v1
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()			%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()
	%i3 = zext i32 %i2 to i64			%i3 = zext i32 %i2 to i64
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = sub nsw i32 31, %i2			%i9 = sub nsw i32 31, %i2
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([64 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define void @store_load_vindex_small_offset_foo(i32 %idx) {			define void @store_load_vindex_small_offset_foo(i32 %idx) {
	; GFX9-LABEL: store_load_vindex_small_offset_foo:			; GFX9-LABEL: store_load_vindex_small_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: scratch_load_dword v1, off, s32 glc			; GFX9-NEXT: scratch_load_dword v1, off, s32 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x100			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi			; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi
	; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-NEXT: v_mov_b32_e32 v3, 15			; GFX9-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-NEXT: v_and_b32_e32 v0, 15, v0			; GFX9-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX9-NEXT: scratch_store_dword v2, v3, off			; GFX9-NEXT: scratch_store_dword v2, v3, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s32
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v1
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_vindex_small_offset_foo:			; GFX10-LABEL: store_load_vindex_small_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_and_b32_e32 v1, 15, v0			; GFX10-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX10-NEXT: s_add_i32 s1, s32, 0x100
	; GFX10-NEXT: s_add_i32 s0, s32, 0x100			; GFX10-NEXT: s_add_i32 s0, s32, 0x100
	; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, s1			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x100
				; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, s0
	; GFX10-NEXT: v_mov_b32_e32 v2, 15			; GFX10-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-NEXT: v_lshl_add_u32 v1, v1, 2, s0			; GFX10-NEXT: v_lshl_add_u32 v1, v1, 2, vcc_lo
	; GFX10-NEXT: scratch_load_dword v3, off, s32 glc dlc			; GFX10-NEXT: scratch_load_dword v3, off, s32 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX10-NEXT: scratch_store_dword v0, v2, off			; GFX10-NEXT: scratch_store_dword v0, v2, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v1, off glc dlc			; GFX10-NEXT: scratch_load_dword v0, v1, off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s32
	; GFX10-NEXT: v_mov_b32_e32 v1, vcc_lo
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: store_load_vindex_small_offset_foo:			; GFX11-LABEL: store_load_vindex_small_offset_foo:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0			; GFX11-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0
	; GFX11-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-NEXT: scratch_load_b32 v3, off, s32 glc dlc			; GFX11-NEXT: scratch_load_b32 v3, off, s32 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX11-NEXT: scratch_store_b32 v0, v2, s32 offset:256 dlc			; GFX11-NEXT: scratch_store_b32 v0, v2, s32 offset:256 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v1, s32 offset:256 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v1, s32 offset:256 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_dual_mov_b32 v0, s32 :: v_dual_mov_b32 v1, vcc_lo
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v1
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_small_offset_foo:			; GFX9-PAL-LABEL: store_load_vindex_small_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v1, off, s32 glc			; GFX9-PAL-NEXT: scratch_load_dword v1, off, s32 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x100			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, vcc_hi			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, vcc_hi
	; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-PAL-NEXT: v_and_b32_e32 v0, 15, v0			; GFX9-PAL-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off			; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s32
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v1
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX940-LABEL: store_load_vindex_small_offset_foo:			; GFX940-LABEL: store_load_vindex_small_offset_foo:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX940-NEXT: scratch_load_dword v1, off, s32 sc0 sc1			; GFX940-NEXT: scratch_load_dword v1, off, s32 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_lshlrev_b32_e32 v1, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v1, 2, v0
	; GFX940-NEXT: v_mov_b32_e32 v2, 15			; GFX940-NEXT: v_mov_b32_e32 v2, 15
	; GFX940-NEXT: v_and_b32_e32 v0, 15, v0			; GFX940-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX940-NEXT: scratch_store_dword v1, v2, s32 offset:256 sc0 sc1			; GFX940-NEXT: scratch_store_dword v1, v2, s32 offset:256 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX940-NEXT: scratch_load_dword v0, v0, s32 offset:256 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, v0, s32 offset:256 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, s32
	; GFX940-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_setpc_b64 s[30:31]			; GFX940-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: store_load_vindex_small_offset_foo:			; GFX10-PAL-LABEL: store_load_vindex_small_offset_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: v_and_b32_e32 v1, 15, v0			; GFX10-PAL-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX10-PAL-NEXT: s_add_i32 s1, s32, 0x100
	; GFX10-PAL-NEXT: s_add_i32 s0, s32, 0x100			; GFX10-PAL-NEXT: s_add_i32 s0, s32, 0x100
	; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, s1			; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x100
				; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-PAL-NEXT: v_lshl_add_u32 v1, v1, 2, s0			; GFX10-PAL-NEXT: v_lshl_add_u32 v1, v1, 2, vcc_lo
	; GFX10-PAL-NEXT: scratch_load_dword v3, off, s32 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v3, off, s32 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX10-PAL-NEXT: scratch_store_dword v0, v2, off			; GFX10-PAL-NEXT: scratch_store_dword v0, v2, off
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, v1, off glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, v1, off glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s32
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, vcc_lo
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v0
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v1
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-PAL-LABEL: store_load_vindex_small_offset_foo:			; GFX11-PAL-LABEL: store_load_vindex_small_offset_foo:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0
	; GFX11-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-PAL-NEXT: scratch_load_b32 v3, off, s32 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v3, off, s32 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v2, s32 offset:256 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v2, s32 offset:256 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s32 offset:256 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s32 offset:256 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s32 :: v_dual_mov_b32 v1, vcc_lo
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v1
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: store_load_vindex_small_offset_foo:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: scratch_load_dword v1, off, s32 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, 15
				; GCN-NEXT: v_lshlrev_b32_e32 v1, 2, v0
				; GCN-NEXT: v_and_b32_e32 v0, v0, v2
				; GCN-NEXT: scratch_store_dword v1, v2, s32 offset:256 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GCN-NEXT: scratch_load_dword v0, v0, s32 offset:256 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([64 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @zero_init_large_offset_kernel() {			define amdgpu_kernel void @zero_init_large_offset_kernel() {
	; GFX9-LABEL: zero_init_large_offset_kernel:			; GFX9-LABEL: zero_init_large_offset_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	Show All 11 Lines
	; GFX9-NEXT: s_movk_i32 s1, 0x4004			; GFX9-NEXT: s_movk_i32 s1, 0x4004
	; GFX9-NEXT: s_movk_i32 s0, 0x4004			; GFX9-NEXT: s_movk_i32 s0, 0x4004
	; GFX9-NEXT: s_movk_i32 vcc_lo, 0x4004			; GFX9-NEXT: s_movk_i32 vcc_lo, 0x4004
	; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4004			; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4004
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s1			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s1
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48
	; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: zero_init_large_offset_kernel:			; GFX10-LABEL: zero_init_large_offset_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_mov_b32 s0, 0			; GFX10-NEXT: s_mov_b32 s0, 0
	; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4004			; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4004
	; GFX10-NEXT: s_mov_b32 s1, s0			; GFX10-NEXT: s_mov_b32 s1, s0
	; GFX10-NEXT: s_mov_b32 s2, s0			; GFX10-NEXT: s_mov_b32 s2, s0
	; GFX10-NEXT: s_mov_b32 s3, s0			; GFX10-NEXT: s_mov_b32 s3, s0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: s_movk_i32 s2, 0x4004			; GFX10-NEXT: s_movk_i32 s2, 0x4004
	; GFX10-NEXT: s_movk_i32 s1, 0x4004			; GFX10-NEXT: s_movk_i32 s1, 0x4004
	; GFX10-NEXT: s_movk_i32 s0, 0x4004			; GFX10-NEXT: s_movk_i32 s0, 0x4004
	; GFX10-NEXT: v_mov_b32_e32 v4, 4
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s2			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s2
	; GFX10-NEXT: v_mov_b32_e32 v5, 0x4004
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v4
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v5
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: zero_init_large_offset_kernel:			; GFX11-LABEL: zero_init_large_offset_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
	; GFX11-NEXT: s_movk_i32 vcc_lo, 0x4004			; GFX11-NEXT: s_movk_i32 vcc_lo, 0x4004
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-NEXT: s_movk_i32 s2, 0x4004			; GFX11-NEXT: s_movk_i32 s2, 0x4004
	; GFX11-NEXT: s_movk_i32 s1, 0x4004			; GFX11-NEXT: s_movk_i32 s1, 0x4004
	; GFX11-NEXT: s_movk_i32 s0, 0x4004			; GFX11-NEXT: s_movk_i32 s0, 0x4004
	; GFX11-NEXT: v_dual_mov_b32 v4, 4 :: v_dual_mov_b32 v5, 0x4004
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s2			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s2
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s1 offset:16			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s1 offset:16
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s0 offset:32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s0 offset:32
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], vcc_lo offset:48			; GFX11-NEXT: scratch_store_b128 off, v[0:3], vcc_lo offset:48
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v4
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v5
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: zero_init_large_offset_kernel:			; GFX9-PAL-LABEL: zero_init_large_offset_kernel:
	; GFX9-PAL: ; %bb.0:			; GFX9-PAL: ; %bb.0:
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	Show All 15 Lines
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-PAL-NEXT: s_movk_i32 s1, 0x4004			; GFX9-PAL-NEXT: s_movk_i32 s1, 0x4004
	; GFX9-PAL-NEXT: s_movk_i32 s0, 0x4004			; GFX9-PAL-NEXT: s_movk_i32 s0, 0x4004
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48
	; GFX9-PAL-NEXT: s_nop 0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: zero_init_large_offset_kernel:			; GFX940-LABEL: zero_init_large_offset_kernel:
	; GFX940: ; %bb.0:			; GFX940: ; %bb.0:
	; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_mov_b32 s0, 0			; GFX940-NEXT: s_mov_b32 s0, 0
	; GFX940-NEXT: s_mov_b32 s1, s0			; GFX940-NEXT: s_mov_b32 s1, s0
	; GFX940-NEXT: s_mov_b32 s2, s0			; GFX940-NEXT: s_mov_b32 s2, s0
	; GFX940-NEXT: s_mov_b32 s3, s0			; GFX940-NEXT: s_mov_b32 s3, s0
	; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]			; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
	; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]			; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]
	; GFX940-NEXT: s_movk_i32 s1, 0x4004			; GFX940-NEXT: s_movk_i32 s1, 0x4004
	; GFX940-NEXT: s_movk_i32 s0, 0x4004			; GFX940-NEXT: s_movk_i32 s0, 0x4004
	; GFX940-NEXT: s_movk_i32 vcc_lo, 0x4004			; GFX940-NEXT: s_movk_i32 vcc_lo, 0x4004
	; GFX940-NEXT: s_movk_i32 vcc_hi, 0x4004			; GFX940-NEXT: s_movk_i32 vcc_hi, 0x4004
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s1			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s1
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48
	; GFX940-NEXT: s_nop 1
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: zero_init_large_offset_kernel:			; GFX1010-PAL-LABEL: zero_init_large_offset_kernel:
	; GFX1010-PAL: ; %bb.0:			; GFX1010-PAL: ; %bb.0:
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	Show All 13 Lines
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX1010-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX1010-PAL-NEXT: s_movk_i32 s2, 0x4004			; GFX1010-PAL-NEXT: s_movk_i32 s2, 0x4004
	; GFX1010-PAL-NEXT: s_movk_i32 s1, 0x4004			; GFX1010-PAL-NEXT: s_movk_i32 s1, 0x4004
	; GFX1010-PAL-NEXT: s_movk_i32 s0, 0x4004			; GFX1010-PAL-NEXT: s_movk_i32 s0, 0x4004
	; GFX1010-PAL-NEXT: s_movk_i32 vcc_lo, 0x4004			; GFX1010-PAL-NEXT: s_movk_i32 vcc_lo, 0x4004
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v4, 4
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v5, 0x4004
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v4
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v5
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: zero_init_large_offset_kernel:			; GFX1030-PAL-LABEL: zero_init_large_offset_kernel:
	; GFX1030-PAL: ; %bb.0:			; GFX1030-PAL: ; %bb.0:
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	Show All 11 Lines
	; GFX1030-PAL-NEXT: s_mov_b32 s3, s0			; GFX1030-PAL-NEXT: s_mov_b32 s3, s0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX1030-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX1030-PAL-NEXT: s_movk_i32 s2, 0x4004			; GFX1030-PAL-NEXT: s_movk_i32 s2, 0x4004
	; GFX1030-PAL-NEXT: s_movk_i32 s1, 0x4004			; GFX1030-PAL-NEXT: s_movk_i32 s1, 0x4004
	; GFX1030-PAL-NEXT: s_movk_i32 s0, 0x4004			; GFX1030-PAL-NEXT: s_movk_i32 s0, 0x4004
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v4, 4
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v5, 0x4004
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v4
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v5
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: zero_init_large_offset_kernel:			; GFX11-PAL-LABEL: zero_init_large_offset_kernel:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
	; GFX11-PAL-NEXT: s_movk_i32 vcc_lo, 0x4004			; GFX11-PAL-NEXT: s_movk_i32 vcc_lo, 0x4004
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-PAL-NEXT: s_movk_i32 s2, 0x4004			; GFX11-PAL-NEXT: s_movk_i32 s2, 0x4004
	; GFX11-PAL-NEXT: s_movk_i32 s1, 0x4004			; GFX11-PAL-NEXT: s_movk_i32 s1, 0x4004
	; GFX11-PAL-NEXT: s_movk_i32 s0, 0x4004			; GFX11-PAL-NEXT: s_movk_i32 s0, 0x4004
	; GFX11-PAL-NEXT: v_dual_mov_b32 v4, 4 :: v_dual_mov_b32 v5, 0x4004
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s2			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s2
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s1 offset:16			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s1 offset:16
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s0 offset:32			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s0 offset:32
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], vcc_lo offset:48			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], vcc_lo offset:48
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v4
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v5
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-PAL-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%alloca = alloca [32 x i16], align 2, addrspace(5)			%alloca = alloca [32 x i16], align 2, addrspace(5)
	%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*			%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
	call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)			call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
	call void asm sideeffect "; use $0", "s"([4096 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x i16] addrspace(5)* %alloca) #0
	ret void			ret void
	}			}

	define void @zero_init_large_offset_foo() {			define void @zero_init_large_offset_foo() {
	; GFX9-LABEL: zero_init_large_offset_foo:			; GFX9-LABEL: zero_init_large_offset_foo:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, off, s32 offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, s32 offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_mov_b32 s0, 0			; GFX9-NEXT: s_mov_b32 s0, 0
	; GFX9-NEXT: s_mov_b32 s1, s0			; GFX9-NEXT: s_mov_b32 s1, s0
	; GFX9-NEXT: s_mov_b32 s2, s0			; GFX9-NEXT: s_mov_b32 s2, s0
	; GFX9-NEXT: s_mov_b32 s3, s0			; GFX9-NEXT: s_mov_b32 s3, s0
	; GFX9-NEXT: v_mov_b32_e32 v0, s0			; GFX9-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, s1			; GFX9-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-NEXT: v_mov_b32_e32 v3, s3			; GFX9-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-NEXT: s_add_i32 s3, s32, 0x4004
	; GFX9-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX9-NEXT: s_add_i32 s1, s32, 0x4004			; GFX9-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX9-NEXT: s_add_i32 s0, s32, 0x4004			; GFX9-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX9-NEXT: s_add_i32 vcc_lo, s32, 4			; GFX9-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s3
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s2 offset:16
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:32
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:48
	; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4004			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4004
	; GFX9-NEXT: v_mov_b32_e32 v0, vcc_lo			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16
	; GFX9-NEXT: ; use v0			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48
	; GFX9-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: zero_init_large_offset_foo:			; GFX10-LABEL: zero_init_large_offset_foo:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s32 offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s32 offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_mov_b32 s0, 0			; GFX10-NEXT: s_mov_b32 s0, 0
	; GFX10-NEXT: s_add_i32 s4, s32, 0x4004
	; GFX10-NEXT: s_mov_b32 s1, s0			; GFX10-NEXT: s_mov_b32 s1, s0
	; GFX10-NEXT: s_mov_b32 s2, s0			; GFX10-NEXT: s_mov_b32 s2, s0
	; GFX10-NEXT: s_mov_b32 s3, s0			; GFX10-NEXT: s_mov_b32 s3, s0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: s_add_i32 s3, s32, 4
	; GFX10-NEXT: s_add_i32 s2, s32, 0x4004			; GFX10-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX10-NEXT: s_add_i32 s1, s32, 0x4004			; GFX10-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX10-NEXT: s_add_i32 s0, s32, 0x4004			; GFX10-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4004			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s4			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s2
	; GFX10-NEXT: v_mov_b32_e32 v4, s3
	; GFX10-NEXT: v_mov_b32_e32 v5, s2
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v4
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v5
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: zero_init_large_offset_foo:			; GFX11-LABEL: zero_init_large_offset_foo:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s32 offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s32 offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
	; GFX11-NEXT: s_add_i32 s4, s32, 4			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-NEXT: s_add_i32 s3, s32, 0x4004
	; GFX11-NEXT: s_add_i32 s2, s32, 0x4004			; GFX11-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX11-NEXT: s_add_i32 s1, s32, 0x4004			; GFX11-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX11-NEXT: s_add_i32 s0, s32, 0x4004			; GFX11-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX11-NEXT: s_add_i32 vcc_lo, s32, 0x4004			; GFX11-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX11-NEXT: v_dual_mov_b32 v4, s4 :: v_dual_mov_b32 v5, s3
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s2			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s2
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s1 offset:16			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s1 offset:16
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], s0 offset:32			; GFX11-NEXT: scratch_store_b128 off, v[0:3], s0 offset:32
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], vcc_lo offset:48			; GFX11-NEXT: scratch_store_b128 off, v[0:3], vcc_lo offset:48
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v4
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v5
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: zero_init_large_offset_foo:			; GFX9-PAL-LABEL: zero_init_large_offset_foo:
	; GFX9-PAL: ; %bb.0:			; GFX9-PAL: ; %bb.0:
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s32 offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s32 offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_mov_b32 s0, 0			; GFX9-PAL-NEXT: s_mov_b32 s0, 0
	; GFX9-PAL-NEXT: s_mov_b32 s1, s0			; GFX9-PAL-NEXT: s_mov_b32 s1, s0
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_mov_b32 s3, s0			; GFX9-PAL-NEXT: s_mov_b32 s3, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-PAL-NEXT: s_add_i32 s3, s32, 0x4004
	; GFX9-PAL-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX9-PAL-NEXT: s_add_i32 s1, s32, 0x4004			; GFX9-PAL-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX9-PAL-NEXT: s_add_i32 s0, s32, 0x4004			; GFX9-PAL-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX9-PAL-NEXT: s_add_i32 vcc_lo, s32, 4			; GFX9-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s3
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2 offset:16
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:32
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:48
	; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x4004			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x4004
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, vcc_lo			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1
	; GFX9-PAL-NEXT: ;;#ASMSTART			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16
	; GFX9-PAL-NEXT: ; use v0			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX9-PAL-NEXT: ;;#ASMEND			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX940-LABEL: zero_init_large_offset_foo:			; GFX940-LABEL: zero_init_large_offset_foo:
	; GFX940: ; %bb.0:			; GFX940: ; %bb.0:
	; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX940-NEXT: scratch_load_dword v0, off, s32 offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, s32 offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_mov_b32 s0, 0			; GFX940-NEXT: s_mov_b32 s0, 0
	; GFX940-NEXT: s_mov_b32 s1, s0			; GFX940-NEXT: s_mov_b32 s1, s0
	; GFX940-NEXT: s_mov_b32 s2, s0			; GFX940-NEXT: s_mov_b32 s2, s0
	; GFX940-NEXT: s_mov_b32 s3, s0			; GFX940-NEXT: s_mov_b32 s3, s0
	; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]			; GFX940-NEXT: v_mov_b64_e32 v[0:1], s[0:1]
	; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]			; GFX940-NEXT: v_mov_b64_e32 v[2:3], s[2:3]
	; GFX940-NEXT: s_add_i32 s3, s32, 0x4004
	; GFX940-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX940-NEXT: s_add_i32 s1, s32, 0x4004			; GFX940-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX940-NEXT: s_add_i32 s0, s32, 0x4004			; GFX940-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX940-NEXT: s_add_i32 vcc_lo, s32, 4			; GFX940-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s3
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s2 offset:16
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:32
	; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:48
	; GFX940-NEXT: s_add_i32 vcc_hi, s32, 0x4004			; GFX940-NEXT: s_add_i32 vcc_hi, s32, 0x4004
	; GFX940-NEXT: s_nop 0			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s1
	; GFX940-NEXT: v_mov_b32_e32 v0, vcc_lo			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:16
	; GFX940-NEXT: ;;#ASMSTART			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX940-NEXT: ; use v0			; GFX940-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_setpc_b64 s[30:31]			; GFX940-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: zero_init_large_offset_foo:			; GFX10-PAL-LABEL: zero_init_large_offset_foo:
	; GFX10-PAL: ; %bb.0:			; GFX10-PAL: ; %bb.0:
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, off, s32 offset:4 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, off, s32 offset:4 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_mov_b32 s0, 0			; GFX10-PAL-NEXT: s_mov_b32 s0, 0
	; GFX10-PAL-NEXT: s_add_i32 s4, s32, 0x4004
	; GFX10-PAL-NEXT: s_mov_b32 s1, s0			; GFX10-PAL-NEXT: s_mov_b32 s1, s0
	; GFX10-PAL-NEXT: s_mov_b32 s2, s0			; GFX10-PAL-NEXT: s_mov_b32 s2, s0
	; GFX10-PAL-NEXT: s_mov_b32 s3, s0			; GFX10-PAL-NEXT: s_mov_b32 s3, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX10-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-PAL-NEXT: s_add_i32 s3, s32, 4
	; GFX10-PAL-NEXT: s_add_i32 s2, s32, 0x4004			; GFX10-PAL-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX10-PAL-NEXT: s_add_i32 s1, s32, 0x4004			; GFX10-PAL-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX10-PAL-NEXT: s_add_i32 s0, s32, 0x4004			; GFX10-PAL-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004			; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s4			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s2
	; GFX10-PAL-NEXT: v_mov_b32_e32 v4, s3
	; GFX10-PAL-NEXT: v_mov_b32_e32 v5, s2
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s1 offset:16
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], s0 offset:32
	; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48			; GFX10-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v4
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v5
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-PAL-LABEL: zero_init_large_offset_foo:			; GFX11-PAL-LABEL: zero_init_large_offset_foo:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s32 offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s32 offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
	; GFX11-PAL-NEXT: s_add_i32 s4, s32, 4			; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1			; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, s2 :: v_dual_mov_b32 v3, s3
	; GFX11-PAL-NEXT: s_add_i32 s3, s32, 0x4004
	; GFX11-PAL-NEXT: s_add_i32 s2, s32, 0x4004			; GFX11-PAL-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX11-PAL-NEXT: s_add_i32 s1, s32, 0x4004			; GFX11-PAL-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX11-PAL-NEXT: s_add_i32 s0, s32, 0x4004			; GFX11-PAL-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX11-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004			; GFX11-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX11-PAL-NEXT: v_dual_mov_b32 v4, s4 :: v_dual_mov_b32 v5, s3
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s2			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s2
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s1 offset:16			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s1 offset:16
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s0 offset:32			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], s0 offset:32
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], vcc_lo offset:48			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], vcc_lo offset:48
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v4
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v5
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%alloca = alloca [32 x i16], align 2, addrspace(5)			%alloca = alloca [32 x i16], align 2, addrspace(5)
	%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*			%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
	call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)			call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
	call void asm sideeffect "; use $0", "s"([4096 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x i16] addrspace(5)* %alloca) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_load_sindex_large_offset_kernel(i32 %idx) {			define amdgpu_kernel void @store_load_sindex_large_offset_kernel(i32 %idx) {
	; GFX9-LABEL: store_load_sindex_large_offset_kernel:			; GFX9-LABEL: store_load_sindex_large_offset_kernel:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_addk_i32 s1, 0x4004			; GFX9-NEXT: s_addk_i32 s1, 0x4004
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_addk_i32 s0, 0x4004			; GFX9-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_large_offset_kernel:			; GFX10-LABEL: store_load_sindex_large_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_addk_i32 s0, 0x4004			; GFX10-NEXT: s_addk_i32 s0, 0x4004
	; GFX10-NEXT: s_addk_i32 s1, 0x4004			; GFX10-NEXT: s_addk_i32 s1, 0x4004
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_sindex_large_offset_kernel:			; GFX11-LABEL: store_load_sindex_large_offset_kernel:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_dual_mov_b32 v0, 15 :: v_dual_mov_b32 v1, 0x4004			; GFX11-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_and_b32 s1, s0, 15			; GFX11-NEXT: s_and_b32 s1, s0, 15
	; GFX11-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-NEXT: s_addk_i32 s0, 0x4004			; GFX11-NEXT: s_addk_i32 s0, 0x4004
	; GFX11-NEXT: s_addk_i32 s1, 0x4004			; GFX11-NEXT: s_addk_i32 s1, 0x4004
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v1
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_large_offset_kernel:			; GFX9-PAL-LABEL: store_load_sindex_large_offset_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX9-PAL-NEXT: s_mov_b32 s4, s0			; GFX9-PAL-NEXT: s_mov_b32 s4, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	Show All 9 Lines
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_addk_i32 s1, 0x4004			; GFX9-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_addk_i32 s0, 0x4004			; GFX9-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_sindex_large_offset_kernel:			; GFX940-LABEL: store_load_sindex_large_offset_kernel:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX940-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 15			; GFX940-NEXT: v_mov_b32_e32 v0, 15
	; GFX940-NEXT: s_waitcnt lgkmcnt(0)			; GFX940-NEXT: s_waitcnt lgkmcnt(0)
	; GFX940-NEXT: s_lshl_b32 s1, s0, 2			; GFX940-NEXT: s_lshl_b32 s1, s0, 2
	; GFX940-NEXT: s_and_b32 s0, s0, 15			; GFX940-NEXT: s_and_b32 s0, s0, 15
	; GFX940-NEXT: s_addk_i32 s1, 0x4004			; GFX940-NEXT: s_addk_i32 s1, 0x4004
	; GFX940-NEXT: s_lshl_b32 s0, s0, 2			; GFX940-NEXT: s_lshl_b32 s0, s0, 2
	; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1			; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_addk_i32 s0, 0x4004			; GFX940-NEXT: s_addk_i32 s0, 0x4004
	; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_sindex_large_offset_kernel:			; GFX1010-PAL-LABEL: store_load_sindex_large_offset_kernel:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX1010-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX1010-PAL-NEXT: s_mov_b32 s4, s0			; GFX1010-PAL-NEXT: s_mov_b32 s4, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s4, s4, s3			; GFX1010-PAL-NEXT: s_add_u32 s4, s4, s3
	; GFX1010-PAL-NEXT: s_addc_u32 s5, s5, 0			; GFX1010-PAL-NEXT: s_addc_u32 s5, s5, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
	; GFX1010-PAL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX1010-PAL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x4004			; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x4004			; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v0
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v1
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_sindex_large_offset_kernel:			; GFX1030-PAL-LABEL: store_load_sindex_large_offset_kernel:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX1030-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX1030-PAL-NEXT: s_mov_b32 s4, s0			; GFX1030-PAL-NEXT: s_mov_b32 s4, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s4, s4, s3			; GFX1030-PAL-NEXT: s_add_u32 s4, s4, s3
	; GFX1030-PAL-NEXT: s_addc_u32 s5, s5, 0			; GFX1030-PAL-NEXT: s_addc_u32 s5, s5, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
	; GFX1030-PAL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX1030-PAL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x4004			; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x4004			; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v0
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v1
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_sindex_large_offset_kernel:			; GFX11-PAL-LABEL: store_load_sindex_large_offset_kernel:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_load_b32 s0, s[0:1], 0x0			; GFX11-PAL-NEXT: s_load_b32 s0, s[0:1], 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, 15 :: v_dual_mov_b32 v1, 0x4004			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-PAL-NEXT: s_addk_i32 s0, 0x4004			; GFX11-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX11-PAL-NEXT: s_addk_i32 s1, 0x4004			; GFX11-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v1
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([4096 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_ps void @store_load_sindex_large_offset_foo(i32 inreg %idx) {			define amdgpu_ps void @store_load_sindex_large_offset_foo(i32 inreg %idx) {
	; GFX9-LABEL: store_load_sindex_large_offset_foo:			; GFX9-LABEL: store_load_sindex_large_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s0, s2, 2			; GFX9-NEXT: s_lshl_b32 s0, s2, 2
	; GFX9-NEXT: s_addk_i32 s0, 0x4004			; GFX9-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0			; GFX9-NEXT: scratch_store_dword off, v0, s0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_and_b32 s0, s2, 15			; GFX9-NEXT: s_and_b32 s0, s2, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: s_addk_i32 s0, 0x4004			; GFX9-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_large_offset_foo:			; GFX10-LABEL: store_load_sindex_large_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_and_b32 s0, s2, 15			; GFX10-NEXT: s_and_b32 s0, s2, 15
	; GFX10-NEXT: s_lshl_b32 s1, s2, 2			; GFX10-NEXT: s_lshl_b32 s1, s2, 2
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_addk_i32 s1, 0x4004			; GFX10-NEXT: s_addk_i32 s1, 0x4004
	; GFX10-NEXT: s_addk_i32 s0, 0x4004			; GFX10-NEXT: s_addk_i32 s0, 0x4004
	; GFX10-NEXT: scratch_store_dword off, v0, s1			; GFX10-NEXT: scratch_store_dword off, v0, s1
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_sindex_large_offset_foo:			; GFX11-LABEL: store_load_sindex_large_offset_foo:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_dual_mov_b32 v0, 15 :: v_dual_mov_b32 v1, 0x4004			; GFX11-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-NEXT: s_and_b32 s1, s0, 15			; GFX11-NEXT: s_and_b32 s1, s0, 15
	; GFX11-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-NEXT: s_addk_i32 s0, 0x4004			; GFX11-NEXT: s_addk_i32 s0, 0x4004
	; GFX11-NEXT: s_addk_i32 s1, 0x4004			; GFX11-NEXT: s_addk_i32 s1, 0x4004
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v1
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_large_offset_foo:			; GFX9-PAL-LABEL: store_load_sindex_large_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: s_addk_i32 s1, 0x4004			; GFX9-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_addk_i32 s0, 0x4004			; GFX9-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_sindex_large_offset_foo:			; GFX940-LABEL: store_load_sindex_large_offset_foo:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_lshl_b32 s1, s0, 2			; GFX940-NEXT: s_lshl_b32 s1, s0, 2
	; GFX940-NEXT: s_and_b32 s0, s0, 15			; GFX940-NEXT: s_and_b32 s0, s0, 15
	; GFX940-NEXT: s_addk_i32 s1, 0x4004			; GFX940-NEXT: s_addk_i32 s1, 0x4004
	; GFX940-NEXT: v_mov_b32_e32 v0, 15			; GFX940-NEXT: v_mov_b32_e32 v0, 15
	; GFX940-NEXT: s_lshl_b32 s0, s0, 2			; GFX940-NEXT: s_lshl_b32 s0, s0, 2
	; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1			; GFX940-NEXT: scratch_store_dword off, v0, s1 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_addk_i32 s0, 0x4004			; GFX940-NEXT: s_addk_i32 s0, 0x4004
	; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, off, s0 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_sindex_large_offset_foo:			; GFX1010-PAL-LABEL: store_load_sindex_large_offset_foo:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x4004			; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x4004			; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v0
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v1
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_sindex_large_offset_foo:			; GFX1030-PAL-LABEL: store_load_sindex_large_offset_foo:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	Show All 9 Lines
	; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x4004			; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x4004			; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v0
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v1
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_sindex_large_offset_foo:			; GFX11-PAL-LABEL: store_load_sindex_large_offset_foo:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, 15 :: v_dual_mov_b32 v1, 0x4004			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX11-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX11-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX11-PAL-NEXT: s_addk_i32 s0, 0x4004			; GFX11-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX11-PAL-NEXT: s_addk_i32 s1, 0x4004			; GFX11-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s1 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v1
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([4096 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_load_vindex_large_offset_kernel() {			define amdgpu_kernel void @store_load_vindex_large_offset_kernel() {
	; GFX9-LABEL: store_load_vindex_large_offset_kernel:			; GFX9-LABEL: store_load_vindex_large_offset_kernel:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v1, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v1, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-NEXT: v_add_u32_e32 v1, 0x4004, v0			; GFX9-NEXT: v_add_u32_e32 v1, 0x4004, v0
	; GFX9-NEXT: v_mov_b32_e32 v2, 15			; GFX9-NEXT: v_mov_b32_e32 v2, 15
	; GFX9-NEXT: scratch_store_dword v1, v2, off			; GFX9-NEXT: scratch_store_dword v1, v2, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_sub_u32_e32 v0, 0x4004, v0			; GFX9-NEXT: v_sub_u32_e32 v0, 0x4004, v0
	; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124 glc			; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v1
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_vindex_large_offset_kernel:			; GFX10-LABEL: store_load_vindex_large_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: v_mov_b32_e32 v2, 15			; GFX10-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-NEXT: scratch_load_dword v3, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v3, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_add_nc_u32_e32 v1, 0x4004, v0			; GFX10-NEXT: v_add_nc_u32_e32 v1, 0x4004, v0
	; GFX10-NEXT: v_sub_nc_u32_e32 v0, 0x4004, v0			; GFX10-NEXT: v_sub_nc_u32_e32 v0, 0x4004, v0
	; GFX10-NEXT: scratch_store_dword v1, v2, off			; GFX10-NEXT: scratch_store_dword v1, v2, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_vindex_large_offset_kernel:			; GFX11-LABEL: store_load_vindex_large_offset_kernel:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-NEXT: s_movk_i32 vcc_lo, 0x4004			; GFX11-NEXT: s_movk_i32 vcc_lo, 0x4004
	; GFX11-NEXT: scratch_load_b32 v3, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v3, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_sub_nc_u32_e32 v2, 0x4004, v0			; GFX11-NEXT: v_sub_nc_u32_e32 v2, 0x4004, v0
	; GFX11-NEXT: scratch_store_b32 v0, v1, vcc_lo dlc			; GFX11-NEXT: scratch_store_b32 v0, v1, vcc_lo dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX11-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v1
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_large_offset_kernel:			; GFX9-PAL-LABEL: store_load_vindex_large_offset_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: scratch_load_dword v1, off, vcc_hi offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v1, off, vcc_hi offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_add_u32_e32 v1, 0x4004, v0			; GFX9-PAL-NEXT: v_add_u32_e32 v1, 0x4004, v0
	; GFX9-PAL-NEXT: scratch_store_dword v1, v2, off			; GFX9-PAL-NEXT: scratch_store_dword v1, v2, off
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_sub_u32_e32 v0, 0x4004, v0			; GFX9-PAL-NEXT: v_sub_u32_e32 v0, 0x4004, v0
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v1
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_vindex_large_offset_kernel:			; GFX940-LABEL: store_load_vindex_large_offset_kernel:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: scratch_load_dword v1, off, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v1, off, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX940-NEXT: v_mov_b32_e32 v1, 15			; GFX940-NEXT: v_mov_b32_e32 v1, 15
	; GFX940-NEXT: s_movk_i32 vcc_hi, 0x4004			; GFX940-NEXT: s_movk_i32 vcc_hi, 0x4004
	; GFX940-NEXT: scratch_store_dword v0, v1, vcc_hi sc0 sc1			; GFX940-NEXT: scratch_store_dword v0, v1, vcc_hi sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_sub_u32_e32 v0, 0x4004, v0			; GFX940-NEXT: v_sub_u32_e32 v0, 0x4004, v0
	; GFX940-NEXT: scratch_load_dword v0, v0, off offset:124 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, v0, off offset:124 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x4004
	; GFX940-NEXT: v_mov_b32_e32 v1, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v1
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_vindex_large_offset_kernel:			; GFX1010-PAL-LABEL: store_load_vindex_large_offset_kernel:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1010-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX1010-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: scratch_load_dword v3, off, vcc_lo offset:4 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v3, off, vcc_lo offset:4 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_add_nc_u32_e32 v1, 0x4004, v0			; GFX1010-PAL-NEXT: v_add_nc_u32_e32 v1, 0x4004, v0
	; GFX1010-PAL-NEXT: v_sub_nc_u32_e32 v0, 0x4004, v0			; GFX1010-PAL-NEXT: v_sub_nc_u32_e32 v0, 0x4004, v0
	; GFX1010-PAL-NEXT: scratch_store_dword v1, v2, off			; GFX1010-PAL-NEXT: scratch_store_dword v1, v2, off
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v0
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v1
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_vindex_large_offset_kernel:			; GFX1030-PAL-LABEL: store_load_vindex_large_offset_kernel:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1030-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX1030-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX1030-PAL-NEXT: scratch_load_dword v3, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v3, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_add_nc_u32_e32 v1, 0x4004, v0			; GFX1030-PAL-NEXT: v_add_nc_u32_e32 v1, 0x4004, v0
	; GFX1030-PAL-NEXT: v_sub_nc_u32_e32 v0, 0x4004, v0			; GFX1030-PAL-NEXT: v_sub_nc_u32_e32 v0, 0x4004, v0
	; GFX1030-PAL-NEXT: scratch_store_dword v1, v2, off			; GFX1030-PAL-NEXT: scratch_store_dword v1, v2, off
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v0
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v1
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_vindex_large_offset_kernel:			; GFX11-PAL-LABEL: store_load_vindex_large_offset_kernel:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0			; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_lshlrev_b32 v0, 2, v0
	; GFX11-PAL-NEXT: s_movk_i32 vcc_lo, 0x4004			; GFX11-PAL-NEXT: s_movk_i32 vcc_lo, 0x4004
	; GFX11-PAL-NEXT: scratch_load_b32 v3, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v3, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_sub_nc_u32_e32 v2, 0x4004, v0			; GFX11-PAL-NEXT: v_sub_nc_u32_e32 v2, 0x4004, v0
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, vcc_lo dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, vcc_lo dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, 0x4004
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v1
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()			%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()
	%i3 = zext i32 %i2 to i64			%i3 = zext i32 %i2 to i64
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = sub nsw i32 31, %i2			%i9 = sub nsw i32 31, %i2
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([4096 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define void @store_load_vindex_large_offset_foo(i32 %idx) {			define void @store_load_vindex_large_offset_foo(i32 %idx) {
	; GFX9-LABEL: store_load_vindex_large_offset_foo:			; GFX9-LABEL: store_load_vindex_large_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: scratch_load_dword v1, off, s32 offset:4 glc			; GFX9-NEXT: scratch_load_dword v1, off, s32 offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_i32 vcc_lo, s32, 0x4004			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4004
	; GFX9-NEXT: v_mov_b32_e32 v1, vcc_lo			; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi
	; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-NEXT: v_mov_b32_e32 v3, 15			; GFX9-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-NEXT: v_and_b32_e32 v0, 15, v0			; GFX9-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX9-NEXT: scratch_store_dword v2, v3, off			; GFX9-NEXT: scratch_store_dword v2, v3, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_i32 vcc_hi, s32, 4
	; GFX9-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v1
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_vindex_large_offset_foo:			; GFX10-LABEL: store_load_vindex_large_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_and_b32_e32 v1, 15, v0			; GFX10-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX10-NEXT: s_add_i32 s2, s32, 0x4004			; GFX10-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX10-NEXT: s_add_i32 s1, s32, 0x4004			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, s2			; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, s0
	; GFX10-NEXT: v_mov_b32_e32 v2, 15			; GFX10-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-NEXT: v_lshl_add_u32 v1, v1, 2, s1			; GFX10-NEXT: v_lshl_add_u32 v1, v1, 2, vcc_lo
	; GFX10-NEXT: scratch_load_dword v3, off, s32 offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v3, off, s32 offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_add_i32 s0, s32, 4
	; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX10-NEXT: scratch_store_dword v0, v2, off			; GFX10-NEXT: scratch_store_dword v0, v2, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v1, off glc dlc			; GFX10-NEXT: scratch_load_dword v0, v1, off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, vcc_lo
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: store_load_vindex_large_offset_foo:			; GFX11-LABEL: store_load_vindex_large_offset_foo:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0			; GFX11-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0
	; GFX11-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX11-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX11-NEXT: s_add_i32 s0, s32, 4
	; GFX11-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GFX11-NEXT: s_add_i32 s0, s32, 0x4004
				; GFX11-NEXT: s_add_i32 vcc_lo, s32, 0x4004
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX11-NEXT: scratch_load_b32 v3, off, s32 offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v3, off, s32 offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: scratch_store_b32 v0, v2, s2 dlc			; GFX11-NEXT: scratch_store_b32 v0, v2, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v1, s1 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v1, vcc_lo glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, vcc_lo
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v1
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_large_offset_foo:			; GFX9-PAL-LABEL: store_load_vindex_large_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v1, off, s32 offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v1, off, s32 offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x4004
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, vcc_lo			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, vcc_hi
	; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-PAL-NEXT: v_and_b32_e32 v0, 15, v0			; GFX9-PAL-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off			; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 4
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v1
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX940-LABEL: store_load_vindex_large_offset_foo:			; GFX940-LABEL: store_load_vindex_large_offset_foo:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX940-NEXT: scratch_load_dword v1, off, s32 offset:4 sc0 sc1			; GFX940-NEXT: scratch_load_dword v1, off, s32 offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_lshlrev_b32_e32 v1, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v1, 2, v0
	; GFX940-NEXT: v_mov_b32_e32 v2, 15			; GFX940-NEXT: v_mov_b32_e32 v2, 15
	; GFX940-NEXT: s_add_i32 s1, s32, 0x4004			; GFX940-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX940-NEXT: v_and_b32_e32 v0, 15, v0			; GFX940-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX940-NEXT: scratch_store_dword v1, v2, s1 sc0 sc1			; GFX940-NEXT: scratch_store_dword v1, v2, vcc_lo sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX940-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX940-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX940-NEXT: scratch_load_dword v0, v0, s0 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_add_i32 vcc_lo, s32, 4
	; GFX940-NEXT: v_mov_b32_e32 v0, vcc_lo
	; GFX940-NEXT: s_add_i32 vcc_hi, s32, 0x4004			; GFX940-NEXT: s_add_i32 vcc_hi, s32, 0x4004
	; GFX940-NEXT: ;;#ASMSTART			; GFX940-NEXT: scratch_load_dword v0, v0, vcc_hi sc0 sc1
	; GFX940-NEXT: ; use v0			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_setpc_b64 s[30:31]			; GFX940-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: store_load_vindex_large_offset_foo:			; GFX10-PAL-LABEL: store_load_vindex_large_offset_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: v_and_b32_e32 v1, 15, v0			; GFX10-PAL-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX10-PAL-NEXT: s_add_i32 s2, s32, 0x4004			; GFX10-PAL-NEXT: s_add_i32 s0, s32, 0x4004
	; GFX10-PAL-NEXT: s_add_i32 s1, s32, 0x4004			; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, s2			; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX10-PAL-NEXT: v_lshl_add_u32 v1, v1, 2, s1			; GFX10-PAL-NEXT: v_lshl_add_u32 v1, v1, 2, vcc_lo
	; GFX10-PAL-NEXT: scratch_load_dword v3, off, s32 offset:4 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v3, off, s32 offset:4 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_add_i32 s0, s32, 4
	; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX10-PAL-NEXT: scratch_store_dword v0, v2, off			; GFX10-PAL-NEXT: scratch_store_dword v0, v2, off
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, v1, off glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, v1, off glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, vcc_lo
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v0
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v1
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-PAL-LABEL: store_load_vindex_large_offset_foo:			; GFX11-PAL-LABEL: store_load_vindex_large_offset_foo:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0			; GFX11-PAL-NEXT: v_dual_mov_b32 v2, 15 :: v_dual_and_b32 v1, 15, v0
	; GFX11-PAL-NEXT: s_add_i32 s2, s32, 0x4004
	; GFX11-PAL-NEXT: s_add_i32 s1, s32, 0x4004
	; GFX11-PAL-NEXT: s_add_i32 s0, s32, 4
	; GFX11-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GFX11-PAL-NEXT: s_add_i32 s0, s32, 0x4004
				; GFX11-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4004
				; GFX11-PAL-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX11-PAL-NEXT: scratch_load_b32 v3, off, s32 offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v3, off, s32 offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v2, s2 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v2, s0 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s1 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, vcc_lo glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, vcc_lo
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v1
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: store_load_vindex_large_offset_foo:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: scratch_load_dword v1, off, s32 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, 15
				; GCN-NEXT: v_lshlrev_b32_e32 v1, 2, v0
				; GCN-NEXT: v_and_b32_e32 v0, v0, v2
				; GCN-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GCN-NEXT: scratch_store_dword v1, v2, vcc_hi sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GCN-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GCN-NEXT: scratch_load_dword v0, v0, vcc_hi sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	call void asm sideeffect "; use $0", "s"([4096 x i32] addrspace(5)* %padding) #0
	call void asm sideeffect "; use $0", "s"([32 x float] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_load_large_imm_offset_kernel() {			define amdgpu_kernel void @store_load_large_imm_offset_kernel() {
	; GFX9-LABEL: store_load_large_imm_offset_kernel:			; GFX9-LABEL: store_load_large_imm_offset_kernel:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: s_movk_i32 s0, 0x3000			; GFX9-NEXT: s_movk_i32 s0, 0x3000
	; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4			; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_i32 s0, s0, 4			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712			; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_large_imm_offset_kernel:			; GFX10-LABEL: store_load_large_imm_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: v_mov_b32_e32 v0, 13			; GFX10-NEXT: v_mov_b32_e32 v0, 13
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_movk_i32 s0, 0x3800			; GFX10-NEXT: s_movk_i32 s0, 0x3800
	; GFX10-NEXT: s_add_i32 s0, s0, 4			; GFX10-NEXT: s_add_i32 s0, s0, 4
	; GFX10-NEXT: scratch_store_dword off, v0, off offset:4			; GFX10-NEXT: scratch_store_dword off, v0, off offset:4
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_large_imm_offset_kernel:			; GFX11-LABEL: store_load_large_imm_offset_kernel:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: v_dual_mov_b32 v0, 13 :: v_dual_mov_b32 v1, 0x3000			; GFX11-NEXT: v_dual_mov_b32 v0, 13 :: v_dual_mov_b32 v1, 0x3000
	; GFX11-NEXT: v_mov_b32_e32 v2, 15			; GFX11-NEXT: v_mov_b32_e32 v2, 15
	; GFX11-NEXT: scratch_store_b32 off, v0, off offset:4 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, off offset:4 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_store_b32 v1, v2, off offset:3716 dlc			; GFX11-NEXT: scratch_store_b32 v1, v2, off offset:3716 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v1, off offset:3716 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v1, off offset:3716 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_large_imm_offset_kernel:			; GFX9-PAL-LABEL: store_load_large_imm_offset_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_movk_i32 s0, 0x3000			; GFX9-PAL-NEXT: s_movk_i32 s0, 0x3000
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, vcc_hi offset:4			; GFX9-PAL-NEXT: scratch_store_dword off, v0, vcc_hi offset:4
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s0 offset:3712			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s0 offset:3712
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_large_imm_offset_kernel:			; GFX940-LABEL: store_load_large_imm_offset_kernel:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: v_mov_b32_e32 v0, 13			; GFX940-NEXT: v_mov_b32_e32 v0, 13
	; GFX940-NEXT: scratch_store_dword off, v0, off offset:4 sc0 sc1			; GFX940-NEXT: scratch_store_dword off, v0, off offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x3000			; GFX940-NEXT: v_mov_b32_e32 v0, 0x3000
	; GFX940-NEXT: v_mov_b32_e32 v1, 15			; GFX940-NEXT: v_mov_b32_e32 v1, 15
	; GFX940-NEXT: scratch_store_dword v0, v1, off offset:3716 sc0 sc1			; GFX940-NEXT: scratch_store_dword v0, v1, off offset:3716 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: scratch_load_dword v0, v0, off offset:3716 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, v0, off offset:3716 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_large_imm_offset_kernel:			; GFX1010-PAL-LABEL: store_load_large_imm_offset_kernel:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX1010-PAL-NEXT: s_movk_i32 s0, 0x3800			; GFX1010-PAL-NEXT: s_movk_i32 s0, 0x3800
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX1010-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, vcc_lo offset:4			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, vcc_lo offset:4
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX1010-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1010-PAL-NEXT: ;;#ASMSTART
	; GFX1010-PAL-NEXT: ; use v0
	; GFX1010-PAL-NEXT: ;;#ASMEND
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_large_imm_offset_kernel:			; GFX1030-PAL-LABEL: store_load_large_imm_offset_kernel:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX1030-PAL-NEXT: s_movk_i32 s0, 0x3800			; GFX1030-PAL-NEXT: s_movk_i32 s0, 0x3800
	; GFX1030-PAL-NEXT: s_add_i32 s0, s0, 4			; GFX1030-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, off offset:4			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, off offset:4
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX1030-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX1030-PAL-NEXT: ;;#ASMSTART
	; GFX1030-PAL-NEXT: ; use v0
	; GFX1030-PAL-NEXT: ;;#ASMEND
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_large_imm_offset_kernel:			; GFX11-PAL-LABEL: store_load_large_imm_offset_kernel:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, 13 :: v_dual_mov_b32 v1, 0x3000			; GFX11-PAL-NEXT: v_dual_mov_b32 v0, 13 :: v_dual_mov_b32 v1, 0x3000
	; GFX11-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX11-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX11-PAL-NEXT: scratch_store_b32 off, v0, off offset:4 dlc			; GFX11-PAL-NEXT: scratch_store_b32 off, v0, off offset:4 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_store_b32 v1, v2, off offset:3716 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v1, v2, off offset:3716 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, off offset:3716 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, off offset:3716 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	bb:			bb:
	%i = alloca [4096 x i32], align 4, addrspace(5)			%i = alloca [4096 x i32], align 4, addrspace(5)
	%i1 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 undef			%i1 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 undef
	store volatile i32 13, i32 addrspace(5)* %i1, align 4			store volatile i32 13, i32 addrspace(5)* %i1, align 4
	%i7 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000			%i7 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000
	store volatile i32 15, i32 addrspace(5)* %i7, align 4			store volatile i32 15, i32 addrspace(5)* %i7, align 4
	%i10 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000			%i10 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000
	%i12 = load volatile i32, i32 addrspace(5)* %i10, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i10, align 4
	call void asm sideeffect "; use $0", "s"([4096 x i32] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define void @store_load_large_imm_offset_foo() {			define void @store_load_large_imm_offset_foo() {
	; GFX9-LABEL: store_load_large_imm_offset_foo:			; GFX9-LABEL: store_load_large_imm_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: s_movk_i32 s0, 0x3000			; GFX9-NEXT: s_movk_i32 s0, 0x3000
	; GFX9-NEXT: s_add_i32 vcc_lo, s32, 4			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 4
	; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:4			; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_i32 s0, s0, vcc_lo			; GFX9-NEXT: s_add_i32 s0, s0, vcc_hi
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712			; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_i32 vcc_hi, s32, 4
	; GFX9-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v0
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_large_imm_offset_foo:			; GFX10-LABEL: store_load_large_imm_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_mov_b32_e32 v0, 13			; GFX10-NEXT: v_mov_b32_e32 v0, 13
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_movk_i32 s0, 0x3800			; GFX10-NEXT: s_movk_i32 s0, 0x3800
	; GFX10-NEXT: s_add_i32 s1, s32, 4			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 4
	; GFX10-NEXT: s_add_i32 s0, s0, s1			; GFX10-NEXT: s_add_i32 s0, s0, vcc_lo
	; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:4			; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:4
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_add_i32 vcc_lo, s32, 4
	; GFX10-NEXT: v_mov_b32_e32 v0, vcc_lo
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: store_load_large_imm_offset_foo:			; GFX11-LABEL: store_load_large_imm_offset_foo:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_dual_mov_b32 v0, 13 :: v_dual_mov_b32 v1, 0x3000			; GFX11-NEXT: v_dual_mov_b32 v0, 13 :: v_dual_mov_b32 v1, 0x3000
	; GFX11-NEXT: v_mov_b32_e32 v2, 15			; GFX11-NEXT: v_mov_b32_e32 v2, 15
	; GFX11-NEXT: s_add_i32 vcc_lo, s32, 4
	; GFX11-NEXT: scratch_store_b32 off, v0, s32 offset:4 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s32 offset:4 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_store_b32 v1, v2, s32 offset:3716 dlc			; GFX11-NEXT: scratch_store_b32 v1, v2, s32 offset:3716 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v1, s32 offset:3716 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v1, s32 offset:3716 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, vcc_lo
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: store_load_large_imm_offset_foo:			; GFX9-PAL-LABEL: store_load_large_imm_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-PAL-NEXT: s_movk_i32 s0, 0x3000			; GFX9-PAL-NEXT: s_movk_i32 s0, 0x3000
	; GFX9-PAL-NEXT: s_add_i32 vcc_lo, s32, 4			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 4
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s32 offset:4			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s32 offset:4
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_i32 s0, s0, vcc_lo			; GFX9-PAL-NEXT: s_add_i32 s0, s0, vcc_hi
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s0 offset:3712			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s0 offset:3712
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 4
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v0
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX940-LABEL: store_load_large_imm_offset_foo:			; GFX940-LABEL: store_load_large_imm_offset_foo:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 13			; GFX940-NEXT: v_mov_b32_e32 v0, 13
	; GFX940-NEXT: scratch_store_dword off, v0, s32 offset:4 sc0 sc1			; GFX940-NEXT: scratch_store_dword off, v0, s32 offset:4 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 0x3000			; GFX940-NEXT: v_mov_b32_e32 v0, 0x3000
	; GFX940-NEXT: v_mov_b32_e32 v1, 15			; GFX940-NEXT: v_mov_b32_e32 v1, 15
	; GFX940-NEXT: scratch_store_dword v0, v1, s32 offset:3716 sc0 sc1			; GFX940-NEXT: scratch_store_dword v0, v1, s32 offset:3716 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: scratch_load_dword v0, v0, s32 offset:3716 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, v0, s32 offset:3716 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: s_add_i32 vcc_hi, s32, 4
	; GFX940-NEXT: v_mov_b32_e32 v0, vcc_hi
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_setpc_b64 s[30:31]			; GFX940-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: store_load_large_imm_offset_foo:			; GFX10-PAL-LABEL: store_load_large_imm_offset_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-PAL-NEXT: s_movk_i32 s0, 0x3800			; GFX10-PAL-NEXT: s_movk_i32 s0, 0x3800
	; GFX10-PAL-NEXT: s_add_i32 s1, s32, 4			; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 4
	; GFX10-PAL-NEXT: s_add_i32 s0, s0, s1			; GFX10-PAL-NEXT: s_add_i32 s0, s0, vcc_lo
	; GFX10-PAL-NEXT: scratch_store_dword off, v0, s32 offset:4			; GFX10-PAL-NEXT: scratch_store_dword off, v0, s32 offset:4
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX10-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 4
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, vcc_lo
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v0
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-PAL-LABEL: store_load_large_imm_offset_foo:			; GFX11-PAL-LABEL: store_load_large_imm_offset_foo:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v0, 13 :: v_dual_mov_b32 v1, 0x3000			; GFX11-PAL-NEXT: v_dual_mov_b32 v0, 13 :: v_dual_mov_b32 v1, 0x3000
	; GFX11-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX11-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX11-PAL-NEXT: s_add_i32 vcc_lo, s32, 4
	; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s32 offset:4 dlc			; GFX11-PAL-NEXT: scratch_store_b32 off, v0, s32 offset:4 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_store_b32 v1, v2, s32 offset:3716 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v1, v2, s32 offset:3716 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s32 offset:3716 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s32 offset:3716 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, vcc_lo
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: store_load_large_imm_offset_foo:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v0, 13
				; GCN-NEXT: scratch_store_dword off, v0, s32 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v0, 0x3000
				; GCN-NEXT: v_mov_b32_e32 v1, 15
				; GCN-NEXT: scratch_store_dword v0, v1, s32 offset:3712 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: scratch_load_dword v0, v0, s32 offset:3712 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%i = alloca [4096 x i32], align 4, addrspace(5)			%i = alloca [4096 x i32], align 4, addrspace(5)
	%i1 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 undef			%i1 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 undef
	store volatile i32 13, i32 addrspace(5)* %i1, align 4			store volatile i32 13, i32 addrspace(5)* %i1, align 4
	%i7 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000			%i7 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000
	store volatile i32 15, i32 addrspace(5)* %i7, align 4			store volatile i32 15, i32 addrspace(5)* %i7, align 4
	%i10 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000			%i10 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000
	%i12 = load volatile i32, i32 addrspace(5)* %i10, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i10, align 4
	call void asm sideeffect "; use $0", "s"([4096 x i32] addrspace(5)* %i) #0
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_load_vidx_sidx_offset(i32 %sidx) {			define amdgpu_kernel void @store_load_vidx_sidx_offset(i32 %sidx) {
	; GFX9-LABEL: store_load_vidx_sidx_offset:			; GFX9-LABEL: store_load_vidx_sidx_offset:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: v_mov_b32_e32 v2, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_add_u32_e32 v0, s0, v0			; GFX9-NEXT: v_add_u32_e32 v0, s0, v0
	; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-NEXT: scratch_store_dword v0, v2, off offset:1024			; GFX9-NEXT: v_mov_b32_e32 v1, 15
				; GFX9-NEXT: scratch_store_dword v0, v1, off offset:1024
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, v0, off offset:1024 glc			; GFX9-NEXT: scratch_load_dword v0, v0, off offset:1024 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v1
	; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_vidx_sidx_offset:			; GFX10-LABEL: store_load_vidx_sidx_offset:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_add_nc_u32_e32 v0, s0, v0			; GFX10-NEXT: v_add_nc_u32_e32 v0, s0, v0
	; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, 4			; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, 4
	; GFX10-NEXT: scratch_store_dword v0, v1, off offset:1024			; GFX10-NEXT: scratch_store_dword v0, v1, off offset:1024
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v0, off offset:1024 glc dlc			; GFX10-NEXT: scratch_load_dword v0, v0, off offset:1024 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v0
	; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_vidx_sidx_offset:			; GFX11-LABEL: store_load_vidx_sidx_offset:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 15			; GFX11-NEXT: v_mov_b32_e32 v1, 15
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_add_lshl_u32 v0, s0, v0, 2			; GFX11-NEXT: v_add_lshl_u32 v0, s0, v0, 2
	; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:1028 dlc			; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:1028 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v0, off offset:1028 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v0, off offset:1028 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; use v0
	; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_vidx_sidx_offset:			; GFX9-PAL-LABEL: store_load_vidx_sidx_offset:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX9-PAL-NEXT: s_mov_b32 s4, s0			; GFX9-PAL-NEXT: s_mov_b32 s4, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, 4			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, 15
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3
	; GFX9-PAL-NEXT: v_add_u32_e32 v0, s0, v0			; GFX9-PAL-NEXT: v_add_u32_e32 v0, s0, v0
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
	; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-PAL-NEXT: scratch_store_dword v0, v2, off offset:1024			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, 15
				; GFX9-PAL-NEXT: scratch_store_dword v0, v1, off offset:1024
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off offset:1024 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off offset:1024 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: ;;#ASMSTART
	; GFX9-PAL-NEXT: ; use v1
	; GFX9-PAL-NEXT: ;;#ASMEND
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX940-LABEL: store_load_vidx_sidx_offset:			; GFX940-LABEL: store_load_vidx_sidx_offset:
	; GFX940: ; %bb.0: ; %bb			; GFX940: ; %bb.0: ; %bb
	; GFX940-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX940-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX940-NEXT: v_mov_b32_e32 v1, 15			; GFX940-NEXT: v_mov_b32_e32 v1, 15
	; GFX940-NEXT: s_waitcnt lgkmcnt(0)			; GFX940-NEXT: s_waitcnt lgkmcnt(0)
	; GFX940-NEXT: v_add_lshl_u32 v0, s0, v0, 2			; GFX940-NEXT: v_add_lshl_u32 v0, s0, v0, 2
	; GFX940-NEXT: scratch_store_dword v0, v1, off offset:1028 sc0 sc1			; GFX940-NEXT: scratch_store_dword v0, v1, off offset:1028 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: scratch_load_dword v0, v0, off offset:1028 sc0 sc1			; GFX940-NEXT: scratch_load_dword v0, v0, off offset:1028 sc0 sc1
	; GFX940-NEXT: s_waitcnt vmcnt(0)			; GFX940-NEXT: s_waitcnt vmcnt(0)
	; GFX940-NEXT: v_mov_b32_e32 v0, 4
	; GFX940-NEXT: ;;#ASMSTART
	; GFX940-NEXT: ; use v0
	; GFX940-NEXT: ;;#ASMEND
	; GFX940-NEXT: s_endpgm			; GFX940-NEXT: s_endpgm
	;			;
	; GFX10-PAL-LABEL: store_load_vidx_sidx_offset:			; GFX10-PAL-LABEL: store_load_vidx_sidx_offset:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX10-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX10-PAL-NEXT: s_mov_b32 s4, s0			; GFX10-PAL-NEXT: s_mov_b32 s4, s0
	; GFX10-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX10-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX10-PAL-NEXT: s_add_u32 s4, s4, s3			; GFX10-PAL-NEXT: s_add_u32 s4, s4, s3
	; GFX10-PAL-NEXT: s_addc_u32 s5, s5, 0			; GFX10-PAL-NEXT: s_addc_u32 s5, s5, 0
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
	; GFX10-PAL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-PAL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-PAL-NEXT: v_add_nc_u32_e32 v0, s0, v0			; GFX10-PAL-NEXT: v_add_nc_u32_e32 v0, s0, v0
	; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, 4			; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, 4
	; GFX10-PAL-NEXT: scratch_store_dword v0, v1, off offset:1024			; GFX10-PAL-NEXT: scratch_store_dword v0, v1, off offset:1024
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, v0, off offset:1024 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, v0, off offset:1024 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v0
	; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_endpgm			; GFX10-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_vidx_sidx_offset:			; GFX11-PAL-LABEL: store_load_vidx_sidx_offset:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_load_b32 s0, s[0:1], 0x0			; GFX11-PAL-NEXT: s_load_b32 s0, s[0:1], 0x0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX11-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-PAL-NEXT: v_add_lshl_u32 v0, s0, v0, 2			; GFX11-PAL-NEXT: v_add_lshl_u32 v0, s0, v0, 2
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:1028 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:1028 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v0, off offset:1028 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v0, off offset:1028 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 4
	; GFX11-PAL-NEXT: ;;#ASMSTART
	; GFX11-PAL-NEXT: ; use v0
	; GFX11-PAL-NEXT: ;;#ASMEND
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
				; GCN-LABEL: store_load_vidx_sidx_offset:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_load_dword s0, s[0:1], 0x24
				; GCN-NEXT: v_mov_b32_e32 v1, 15
				; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: v_add_lshl_u32 v0, s0, v0, 2
				; GCN-NEXT: scratch_store_dword v0, v1, off offset:1028 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: scratch_load_dword v0, v0, off offset:1028 sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_endpgm
	bb:			bb:
	%alloca = alloca [32 x i32], align 4, addrspace(5)			%alloca = alloca [32 x i32], align 4, addrspace(5)
	%vidx = tail call i32 @llvm.amdgcn.workitem.id.x()			%vidx = tail call i32 @llvm.amdgcn.workitem.id.x()
	%add1 = add nsw i32 %sidx, %vidx			%add1 = add nsw i32 %sidx, %vidx
	%add2 = add nsw i32 %add1, 256			%add2 = add nsw i32 %add1, 256
	%gep = getelementptr inbounds [32 x i32], [32 x i32] addrspace(5)* %alloca, i32 0, i32 %add2			%gep = getelementptr inbounds [32 x i32], [32 x i32] addrspace(5)* %alloca, i32 0, i32 %add2
	store volatile i32 15, i32 addrspace(5)* %gep, align 4			store volatile i32 15, i32 addrspace(5)* %gep, align 4
	%load = load volatile i32, i32 addrspace(5)* %gep, align 4			%load = load volatile i32, i32 addrspace(5)* %gep, align 4
	call void asm sideeffect "; use $0", "s"([32 x i32] addrspace(5)* %alloca) #0
	ret void			ret void
	}			}

	define void @store_load_i64_aligned(i64 addrspace(5)* nocapture %arg) {			define void @store_load_i64_aligned(i64 addrspace(5)* nocapture %arg) {
	; GFX9-LABEL: store_load_i64_aligned:			; GFX9-LABEL: store_load_i64_aligned:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v1, 15			; GFX9-NEXT: v_mov_b32_e32 v1, 15
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_mov_b32 v2, 0			; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_mov_b32 v2, 0
	; GFX11-PAL-NEXT: scratch_store_b64 v0, v[1:2], off dlc			; GFX11-PAL-NEXT: scratch_store_b64 v0, v[1:2], off dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b64 v[0:1], v0, off glc dlc			; GFX11-PAL-NEXT: scratch_load_b64 v[0:1], v0, off glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: store_load_i64_aligned:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, 15
				; GCN-NEXT: v_mov_b32_e32 v3, 0
				; GCN-NEXT: scratch_store_dwordx2 v0, v[2:3], off sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: scratch_load_dwordx2 v[0:1], v0, off sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	store volatile i64 15, i64 addrspace(5)* %arg, align 8			store volatile i64 15, i64 addrspace(5)* %arg, align 8
	%load = load volatile i64, i64 addrspace(5)* %arg, align 8			%load = load volatile i64, i64 addrspace(5)* %arg, align 8
	ret void			ret void
	}			}

	define void @store_load_i64_unaligned(i64 addrspace(5)* nocapture %arg) {			define void @store_load_i64_unaligned(i64 addrspace(5)* nocapture %arg) {
	; GFX9-LABEL: store_load_i64_unaligned:			; GFX9-LABEL: store_load_i64_unaligned:
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_mov_b32 v2, 0			; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 15 :: v_dual_mov_b32 v2, 0
	; GFX11-PAL-NEXT: scratch_store_b64 v0, v[1:2], off dlc			; GFX11-PAL-NEXT: scratch_store_b64 v0, v[1:2], off dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b64 v[0:1], v0, off glc dlc			; GFX11-PAL-NEXT: scratch_load_b64 v[0:1], v0, off glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: store_load_i64_unaligned:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, 15
				; GCN-NEXT: v_mov_b32_e32 v3, 0
				; GCN-NEXT: scratch_store_dwordx2 v0, v[2:3], off sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: scratch_load_dwordx2 v[0:1], v0, off sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	store volatile i64 15, i64 addrspace(5)* %arg, align 1			store volatile i64 15, i64 addrspace(5)* %arg, align 1
	%load = load volatile i64, i64 addrspace(5)* %arg, align 1			%load = load volatile i64, i64 addrspace(5)* %arg, align 1
	ret void			ret void
	}			}

	define void @store_load_v3i32_unaligned(<3 x i32> addrspace(5)* nocapture %arg) {			define void @store_load_v3i32_unaligned(<3 x i32> addrspace(5)* nocapture %arg) {
	; GFX9-LABEL: store_load_v3i32_unaligned:			; GFX9-LABEL: store_load_v3i32_unaligned:
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 1 :: v_dual_mov_b32 v2, 2			; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 1 :: v_dual_mov_b32 v2, 2
	; GFX11-PAL-NEXT: v_mov_b32_e32 v3, 3			; GFX11-PAL-NEXT: v_mov_b32_e32 v3, 3
	; GFX11-PAL-NEXT: scratch_store_b96 v0, v[1:3], off dlc			; GFX11-PAL-NEXT: scratch_store_b96 v0, v[1:3], off dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b96 v[0:2], v0, off glc dlc			; GFX11-PAL-NEXT: scratch_load_b96 v[0:2], v0, off glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: store_load_v3i32_unaligned:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, 1
				; GCN-NEXT: v_mov_b32_e32 v3, 2
				; GCN-NEXT: v_mov_b32_e32 v4, 3
				; GCN-NEXT: scratch_store_dwordx3 v0, v[2:4], off sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: scratch_load_dwordx3 v[0:2], v0, off sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	store volatile <3 x i32> <i32 1, i32 2, i32 3>, <3 x i32> addrspace(5)* %arg, align 1			store volatile <3 x i32> <i32 1, i32 2, i32 3>, <3 x i32> addrspace(5)* %arg, align 1
	%load = load volatile <3 x i32>, <3 x i32> addrspace(5)* %arg, align 1			%load = load volatile <3 x i32>, <3 x i32> addrspace(5)* %arg, align 1
	ret void			ret void
	}			}

	define void @store_load_v4i32_unaligned(<4 x i32> addrspace(5)* nocapture %arg) {			define void @store_load_v4i32_unaligned(<4 x i32> addrspace(5)* nocapture %arg) {
	; GFX9-LABEL: store_load_v4i32_unaligned:			; GFX9-LABEL: store_load_v4i32_unaligned:
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 1 :: v_dual_mov_b32 v2, 2			; GFX11-PAL-NEXT: v_dual_mov_b32 v1, 1 :: v_dual_mov_b32 v2, 2
	; GFX11-PAL-NEXT: v_dual_mov_b32 v3, 3 :: v_dual_mov_b32 v4, 4			; GFX11-PAL-NEXT: v_dual_mov_b32 v3, 3 :: v_dual_mov_b32 v4, 4
	; GFX11-PAL-NEXT: scratch_store_b128 v0, v[1:4], off dlc			; GFX11-PAL-NEXT: scratch_store_b128 v0, v[1:4], off dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b128 v[0:3], v0, off glc dlc			; GFX11-PAL-NEXT: scratch_load_b128 v[0:3], v0, off glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
				; GCN-LABEL: store_load_v4i32_unaligned:
				; GCN: ; %bb.0: ; %bb
				; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, 1
				; GCN-NEXT: v_mov_b32_e32 v3, 2
				; GCN-NEXT: v_mov_b32_e32 v4, 3
				; GCN-NEXT: v_mov_b32_e32 v5, 4
				; GCN-NEXT: scratch_store_dwordx4 v0, v[2:5], off sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: scratch_load_dwordx4 v[0:3], v0, off sc0 sc1
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %arg, align 1			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %arg, align 1
	%load = load volatile <4 x i32>, <4 x i32> addrspace(5)* %arg, align 1			%load = load volatile <4 x i32>, <4 x i32> addrspace(5)* %arg, align 1
	ret void			ret void
	}			}

	define void @store_load_i32_negative_unaligned(i8 addrspace(5)* nocapture %arg) {			define void @store_load_i32_negative_unaligned(i8 addrspace(5)* nocapture %arg) {
	; GFX9-LABEL: store_load_i32_negative_unaligned:			; GFX9-LABEL: store_load_i32_negative_unaligned:
	▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: MachineDominator Tree Construction			; GCN-O1-NEXT: MachineDominator Tree Construction
	; GCN-O1-NEXT: Machine Natural Loop Construction			; GCN-O1-NEXT: Machine Natural Loop Construction
	; GCN-O1-NEXT: Machine Block Frequency Analysis			; GCN-O1-NEXT: Machine Block Frequency Analysis
	; GCN-O1-NEXT: MachinePostDominator Tree Construction			; GCN-O1-NEXT: MachinePostDominator Tree Construction
	; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-NEXT: Machine Optimization Remark Emitter			; GCN-O1-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-NEXT: Shrink Wrapping analysis			; GCN-O1-NEXT: Shrink Wrapping analysis
	; GCN-O1-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; GCN-O1-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; GCN-O1-NEXT: Machine Late Instructions Cleanup Pass
	; GCN-O1-NEXT: Control Flow Optimizer			; GCN-O1-NEXT: Control Flow Optimizer
	; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-NEXT: Tail Duplication			; GCN-O1-NEXT: Tail Duplication
	; GCN-O1-NEXT: Machine Copy Propagation Pass			; GCN-O1-NEXT: Machine Copy Propagation Pass
	; GCN-O1-NEXT: Post-RA pseudo instruction expansion pass			; GCN-O1-NEXT: Post-RA pseudo instruction expansion pass
	; GCN-O1-NEXT: SI Shrink Instructions			; GCN-O1-NEXT: SI Shrink Instructions
	; GCN-O1-NEXT: SI post-RA bundler			; GCN-O1-NEXT: SI post-RA bundler
	; GCN-O1-NEXT: MachineDominator Tree Construction			; GCN-O1-NEXT: MachineDominator Tree Construction
	▲ Show 20 Lines • Show All 281 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: MachineDominator Tree Construction			; GCN-O1-OPTS-NEXT: MachineDominator Tree Construction
	; GCN-O1-OPTS-NEXT: Machine Natural Loop Construction			; GCN-O1-OPTS-NEXT: Machine Natural Loop Construction
	; GCN-O1-OPTS-NEXT: Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: MachinePostDominator Tree Construction			; GCN-O1-OPTS-NEXT: MachinePostDominator Tree Construction
	; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter			; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-OPTS-NEXT: Shrink Wrapping analysis			; GCN-O1-OPTS-NEXT: Shrink Wrapping analysis
	; GCN-O1-OPTS-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; GCN-O1-OPTS-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; GCN-O1-OPTS-NEXT: Machine Late Instructions Cleanup Pass
	; GCN-O1-OPTS-NEXT: Control Flow Optimizer			; GCN-O1-OPTS-NEXT: Control Flow Optimizer
	; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Tail Duplication			; GCN-O1-OPTS-NEXT: Tail Duplication
	; GCN-O1-OPTS-NEXT: Machine Copy Propagation Pass			; GCN-O1-OPTS-NEXT: Machine Copy Propagation Pass
	; GCN-O1-OPTS-NEXT: Post-RA pseudo instruction expansion pass			; GCN-O1-OPTS-NEXT: Post-RA pseudo instruction expansion pass
	; GCN-O1-OPTS-NEXT: SI Shrink Instructions			; GCN-O1-OPTS-NEXT: SI Shrink Instructions
	; GCN-O1-OPTS-NEXT: SI post-RA bundler			; GCN-O1-OPTS-NEXT: SI post-RA bundler
	; GCN-O1-OPTS-NEXT: MachineDominator Tree Construction			; GCN-O1-OPTS-NEXT: MachineDominator Tree Construction
	▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: MachineDominator Tree Construction			; GCN-O2-NEXT: MachineDominator Tree Construction
	; GCN-O2-NEXT: Machine Natural Loop Construction			; GCN-O2-NEXT: Machine Natural Loop Construction
	; GCN-O2-NEXT: Machine Block Frequency Analysis			; GCN-O2-NEXT: Machine Block Frequency Analysis
	; GCN-O2-NEXT: MachinePostDominator Tree Construction			; GCN-O2-NEXT: MachinePostDominator Tree Construction
	; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O2-NEXT: Machine Optimization Remark Emitter			; GCN-O2-NEXT: Machine Optimization Remark Emitter
	; GCN-O2-NEXT: Shrink Wrapping analysis			; GCN-O2-NEXT: Shrink Wrapping analysis
	; GCN-O2-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; GCN-O2-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; GCN-O2-NEXT: Machine Late Instructions Cleanup Pass
	; GCN-O2-NEXT: Control Flow Optimizer			; GCN-O2-NEXT: Control Flow Optimizer
	; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O2-NEXT: Tail Duplication			; GCN-O2-NEXT: Tail Duplication
	; GCN-O2-NEXT: Machine Copy Propagation Pass			; GCN-O2-NEXT: Machine Copy Propagation Pass
	; GCN-O2-NEXT: Post-RA pseudo instruction expansion pass			; GCN-O2-NEXT: Post-RA pseudo instruction expansion pass
	; GCN-O2-NEXT: SI Shrink Instructions			; GCN-O2-NEXT: SI Shrink Instructions
	; GCN-O2-NEXT: SI post-RA bundler			; GCN-O2-NEXT: SI post-RA bundler
	; GCN-O2-NEXT: MachineDominator Tree Construction			; GCN-O2-NEXT: MachineDominator Tree Construction
	▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: MachineDominator Tree Construction			; GCN-O3-NEXT: MachineDominator Tree Construction
	; GCN-O3-NEXT: Machine Natural Loop Construction			; GCN-O3-NEXT: Machine Natural Loop Construction
	; GCN-O3-NEXT: Machine Block Frequency Analysis			; GCN-O3-NEXT: Machine Block Frequency Analysis
	; GCN-O3-NEXT: MachinePostDominator Tree Construction			; GCN-O3-NEXT: MachinePostDominator Tree Construction
	; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O3-NEXT: Machine Optimization Remark Emitter			; GCN-O3-NEXT: Machine Optimization Remark Emitter
	; GCN-O3-NEXT: Shrink Wrapping analysis			; GCN-O3-NEXT: Shrink Wrapping analysis
	; GCN-O3-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; GCN-O3-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; GCN-O3-NEXT: Machine Late Instructions Cleanup Pass
	; GCN-O3-NEXT: Control Flow Optimizer			; GCN-O3-NEXT: Control Flow Optimizer
	; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O3-NEXT: Tail Duplication			; GCN-O3-NEXT: Tail Duplication
	; GCN-O3-NEXT: Machine Copy Propagation Pass			; GCN-O3-NEXT: Machine Copy Propagation Pass
	; GCN-O3-NEXT: Post-RA pseudo instruction expansion pass			; GCN-O3-NEXT: Post-RA pseudo instruction expansion pass
	; GCN-O3-NEXT: SI Shrink Instructions			; GCN-O3-NEXT: SI Shrink Instructions
	; GCN-O3-NEXT: SI post-RA bundler			; GCN-O3-NEXT: SI post-RA bundler
	; GCN-O3-NEXT: MachineDominator Tree Construction			; GCN-O3-NEXT: MachineDominator Tree Construction
	Show All 37 Lines

llvm/test/CodeGen/AMDGPU/multilevel-break.ll

	Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readfirstlane_b32 s8, v1			; GCN-NEXT: v_readfirstlane_b32 s8, v1
	; GCN-NEXT: s_mov_b64 s[4:5], -1			; GCN-NEXT: s_mov_b64 s[4:5], -1
	; GCN-NEXT: s_cmp_lt_i32 s8, 1			; GCN-NEXT: s_cmp_lt_i32 s8, 1
	; GCN-NEXT: s_mov_b64 s[6:7], -1			; GCN-NEXT: s_mov_b64 s[6:7], -1
	; GCN-NEXT: s_cbranch_scc1 .LBB1_6			; GCN-NEXT: s_cbranch_scc1 .LBB1_6
	; GCN-NEXT: ; %bb.3: ; %LeafBlock1			; GCN-NEXT: ; %bb.3: ; %LeafBlock1
	; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
	; GCN-NEXT: s_cmp_eq_u32 s8, 1			; GCN-NEXT: s_cmp_eq_u32 s8, 1
	; GCN-NEXT: s_mov_b64 s[4:5], -1
	; GCN-NEXT: s_cbranch_scc0 .LBB1_5			; GCN-NEXT: s_cbranch_scc0 .LBB1_5
	; GCN-NEXT: ; %bb.4: ; %case1			; GCN-NEXT: ; %bb.4: ; %case1
	; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
	; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 glc			; GCN-NEXT: buffer_load_dword v1, off, s[0:3], 0 glc
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_cmp_ge_i32_e32 vcc, v0, v1			; GCN-NEXT: v_cmp_ge_i32_e32 vcc, v0, v1
	; GCN-NEXT: s_orn2_b64 s[4:5], vcc, exec			; GCN-NEXT: s_orn2_b64 s[4:5], vcc, exec
	; GCN-NEXT: .LBB1_5: ; %Flow3			; GCN-NEXT: .LBB1_5: ; %Flow3
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-annotate-cf.ll

	Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; SI-NEXT: v_cmp_lt_f32_e64 s[8:9], \|s8\|, v0			; SI-NEXT: v_cmp_lt_f32_e64 s[8:9], \|s8\|, v0
	; SI-NEXT: s_and_b64 s[2:3], exec, s[6:7]			; SI-NEXT: s_and_b64 s[2:3], exec, s[6:7]
	; SI-NEXT: s_and_b64 s[4:5], exec, s[4:5]			; SI-NEXT: s_and_b64 s[4:5], exec, s[4:5]
	; SI-NEXT: s_and_b64 s[6:7], exec, s[8:9]			; SI-NEXT: s_and_b64 s[6:7], exec, s[8:9]
	; SI-NEXT: v_mov_b32_e32 v0, 3			; SI-NEXT: v_mov_b32_e32 v0, 3
	; SI-NEXT: s_branch .LBB3_3			; SI-NEXT: s_branch .LBB3_3
	; SI-NEXT: .LBB3_1: ; in Loop: Header=BB3_3 Depth=1			; SI-NEXT: .LBB3_1: ; in Loop: Header=BB3_3 Depth=1
	; SI-NEXT: s_mov_b64 s[8:9], 0			; SI-NEXT: s_mov_b64 s[8:9], 0
	; SI-NEXT: s_mov_b64 s[12:13], -1
	; SI-NEXT: s_mov_b64 s[14:15], -1
	; SI-NEXT: .LBB3_2: ; %Flow			; SI-NEXT: .LBB3_2: ; %Flow
	; SI-NEXT: ; in Loop: Header=BB3_3 Depth=1			; SI-NEXT: ; in Loop: Header=BB3_3 Depth=1
	; SI-NEXT: s_and_b64 vcc, exec, s[14:15]			; SI-NEXT: s_and_b64 vcc, exec, s[14:15]
	; SI-NEXT: s_cbranch_vccnz .LBB3_8			; SI-NEXT: s_cbranch_vccnz .LBB3_8
	; SI-NEXT: .LBB3_3: ; %while.cond			; SI-NEXT: .LBB3_3: ; %while.cond
	; SI-NEXT: ; =>This Inner Loop Header: Depth=1			; SI-NEXT: ; =>This Inner Loop Header: Depth=1
	; SI-NEXT: s_mov_b64 s[12:13], -1			; SI-NEXT: s_mov_b64 s[12:13], -1
	; SI-NEXT: s_mov_b64 s[8:9], -1			; SI-NEXT: s_mov_b64 s[8:9], -1
	; SI-NEXT: s_mov_b64 s[14:15], -1			; SI-NEXT: s_mov_b64 s[14:15], -1
	; SI-NEXT: s_mov_b64 vcc, s[2:3]			; SI-NEXT: s_mov_b64 vcc, s[2:3]
	; SI-NEXT: s_cbranch_vccz .LBB3_2			; SI-NEXT: s_cbranch_vccz .LBB3_2
	; SI-NEXT: ; %bb.4: ; %convex.exit			; SI-NEXT: ; %bb.4: ; %convex.exit
	; SI-NEXT: ; in Loop: Header=BB3_3 Depth=1			; SI-NEXT: ; in Loop: Header=BB3_3 Depth=1
	; SI-NEXT: s_mov_b64 vcc, s[4:5]			; SI-NEXT: s_mov_b64 vcc, s[4:5]
	; SI-NEXT: s_cbranch_vccz .LBB3_1			; SI-NEXT: s_cbranch_vccz .LBB3_1
	; SI-NEXT: ; %bb.5: ; %if.end			; SI-NEXT: ; %bb.5: ; %if.end
	; SI-NEXT: ; in Loop: Header=BB3_3 Depth=1			; SI-NEXT: ; in Loop: Header=BB3_3 Depth=1
	; SI-NEXT: s_mov_b64 s[14:15], -1
	; SI-NEXT: s_mov_b64 vcc, s[6:7]			; SI-NEXT: s_mov_b64 vcc, s[6:7]
	; SI-NEXT: s_cbranch_vccz .LBB3_7			; SI-NEXT: s_cbranch_vccz .LBB3_7
	; SI-NEXT: ; %bb.6: ; %if.else			; SI-NEXT: ; %bb.6: ; %if.else
	; SI-NEXT: ; in Loop: Header=BB3_3 Depth=1			; SI-NEXT: ; in Loop: Header=BB3_3 Depth=1
	; SI-NEXT: buffer_store_dword v0, off, s[8:11], 0			; SI-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: s_mov_b64 s[14:15], 0			; SI-NEXT: s_mov_b64 s[14:15], 0
	; SI-NEXT: .LBB3_7: ; %Flow6			; SI-NEXT: .LBB3_7: ; %Flow6
	Show All 40 Lines
	; FLAT-NEXT: v_cmp_lt_f32_e64 s[8:9], \|s8\|, v0			; FLAT-NEXT: v_cmp_lt_f32_e64 s[8:9], \|s8\|, v0
	; FLAT-NEXT: s_and_b64 s[2:3], exec, s[6:7]			; FLAT-NEXT: s_and_b64 s[2:3], exec, s[6:7]
	; FLAT-NEXT: s_and_b64 s[4:5], exec, s[4:5]			; FLAT-NEXT: s_and_b64 s[4:5], exec, s[4:5]
	; FLAT-NEXT: s_and_b64 s[6:7], exec, s[8:9]			; FLAT-NEXT: s_and_b64 s[6:7], exec, s[8:9]
	; FLAT-NEXT: v_mov_b32_e32 v0, 3			; FLAT-NEXT: v_mov_b32_e32 v0, 3
	; FLAT-NEXT: s_branch .LBB3_3			; FLAT-NEXT: s_branch .LBB3_3
	; FLAT-NEXT: .LBB3_1: ; in Loop: Header=BB3_3 Depth=1			; FLAT-NEXT: .LBB3_1: ; in Loop: Header=BB3_3 Depth=1
	; FLAT-NEXT: s_mov_b64 s[8:9], 0			; FLAT-NEXT: s_mov_b64 s[8:9], 0
	; FLAT-NEXT: s_mov_b64 s[12:13], -1
	; FLAT-NEXT: s_mov_b64 s[14:15], -1
	; FLAT-NEXT: .LBB3_2: ; %Flow			; FLAT-NEXT: .LBB3_2: ; %Flow
	; FLAT-NEXT: ; in Loop: Header=BB3_3 Depth=1			; FLAT-NEXT: ; in Loop: Header=BB3_3 Depth=1
	; FLAT-NEXT: s_and_b64 vcc, exec, s[14:15]			; FLAT-NEXT: s_and_b64 vcc, exec, s[14:15]
	; FLAT-NEXT: s_cbranch_vccnz .LBB3_8			; FLAT-NEXT: s_cbranch_vccnz .LBB3_8
	; FLAT-NEXT: .LBB3_3: ; %while.cond			; FLAT-NEXT: .LBB3_3: ; %while.cond
	; FLAT-NEXT: ; =>This Inner Loop Header: Depth=1			; FLAT-NEXT: ; =>This Inner Loop Header: Depth=1
	; FLAT-NEXT: s_mov_b64 s[12:13], -1			; FLAT-NEXT: s_mov_b64 s[12:13], -1
	; FLAT-NEXT: s_mov_b64 s[8:9], -1			; FLAT-NEXT: s_mov_b64 s[8:9], -1
	; FLAT-NEXT: s_mov_b64 s[14:15], -1			; FLAT-NEXT: s_mov_b64 s[14:15], -1
	; FLAT-NEXT: s_mov_b64 vcc, s[2:3]			; FLAT-NEXT: s_mov_b64 vcc, s[2:3]
	; FLAT-NEXT: s_cbranch_vccz .LBB3_2			; FLAT-NEXT: s_cbranch_vccz .LBB3_2
	; FLAT-NEXT: ; %bb.4: ; %convex.exit			; FLAT-NEXT: ; %bb.4: ; %convex.exit
	; FLAT-NEXT: ; in Loop: Header=BB3_3 Depth=1			; FLAT-NEXT: ; in Loop: Header=BB3_3 Depth=1
	; FLAT-NEXT: s_mov_b64 vcc, s[4:5]			; FLAT-NEXT: s_mov_b64 vcc, s[4:5]
	; FLAT-NEXT: s_cbranch_vccz .LBB3_1			; FLAT-NEXT: s_cbranch_vccz .LBB3_1
	; FLAT-NEXT: ; %bb.5: ; %if.end			; FLAT-NEXT: ; %bb.5: ; %if.end
	; FLAT-NEXT: ; in Loop: Header=BB3_3 Depth=1			; FLAT-NEXT: ; in Loop: Header=BB3_3 Depth=1
	; FLAT-NEXT: s_mov_b64 s[14:15], -1
	; FLAT-NEXT: s_mov_b64 vcc, s[6:7]			; FLAT-NEXT: s_mov_b64 vcc, s[6:7]
	; FLAT-NEXT: s_cbranch_vccz .LBB3_7			; FLAT-NEXT: s_cbranch_vccz .LBB3_7
	; FLAT-NEXT: ; %bb.6: ; %if.else			; FLAT-NEXT: ; %bb.6: ; %if.else
	; FLAT-NEXT: ; in Loop: Header=BB3_3 Depth=1			; FLAT-NEXT: ; in Loop: Header=BB3_3 Depth=1
	; FLAT-NEXT: buffer_store_dword v0, off, s[8:11], 0			; FLAT-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; FLAT-NEXT: s_waitcnt vmcnt(0)			; FLAT-NEXT: s_waitcnt vmcnt(0)
	; FLAT-NEXT: s_mov_b64 s[14:15], 0			; FLAT-NEXT: s_mov_b64 s[14:15], 0
	; FLAT-NEXT: .LBB3_7: ; %Flow6			; FLAT-NEXT: .LBB3_7: ; %Flow6
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-unify-exit-multiple-unreachables.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_mov_b64 s[2:3], 0			; CHECK-NEXT: s_mov_b64 s[2:3], 0
	; CHECK-NEXT: s_mov_b64 s[0:1], 0			; CHECK-NEXT: s_mov_b64 s[0:1], 0
	; CHECK-NEXT: s_and_saveexec_b64 s[8:9], vcc			; CHECK-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; CHECK-NEXT: s_cbranch_execz .LBB0_5			; CHECK-NEXT: s_cbranch_execz .LBB0_5
	; CHECK-NEXT: ; %bb.2: ; %if.then3			; CHECK-NEXT: ; %bb.2: ; %if.then3
	; CHECK-NEXT: s_cmp_lg_u32 s10, 0			; CHECK-NEXT: s_cmp_lg_u32 s10, 0
	; CHECK-NEXT: s_cbranch_scc1 .LBB0_14			; CHECK-NEXT: s_cbranch_scc1 .LBB0_14
	; CHECK-NEXT: ; %bb.3:			; CHECK-NEXT: ; %bb.3:
	; CHECK-NEXT: s_mov_b64 s[2:3], 0
	; CHECK-NEXT: s_mov_b64 s[0:1], -1			; CHECK-NEXT: s_mov_b64 s[0:1], -1
	; CHECK-NEXT: .LBB0_4: ; %Flow3			; CHECK-NEXT: .LBB0_4: ; %Flow3
	; CHECK-NEXT: s_and_b64 s[0:1], s[0:1], exec			; CHECK-NEXT: s_and_b64 s[0:1], s[0:1], exec
	; CHECK-NEXT: s_and_b64 s[2:3], s[2:3], exec			; CHECK-NEXT: s_and_b64 s[2:3], s[2:3], exec
	; CHECK-NEXT: .LBB0_5: ; %Flow2			; CHECK-NEXT: .LBB0_5: ; %Flow2
	; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]
	; CHECK-NEXT: s_and_b64 vcc, exec, s[6:7]			; CHECK-NEXT: s_and_b64 vcc, exec, s[6:7]
	; CHECK-NEXT: s_cbranch_vccz .LBB0_8			; CHECK-NEXT: s_cbranch_vccz .LBB0_8
	Show All 26 Lines
	; CHECK-NEXT: s_mov_b64 s[0:1], 0			; CHECK-NEXT: s_mov_b64 s[0:1], 0
	; CHECK-NEXT: s_or_b64 s[2:3], s[2:3], exec			; CHECK-NEXT: s_or_b64 s[2:3], s[2:3], exec
	; CHECK-NEXT: s_trap 2			; CHECK-NEXT: s_trap 2
	; CHECK-NEXT: s_and_saveexec_b64 s[6:7], s[2:3]			; CHECK-NEXT: s_and_saveexec_b64 s[6:7], s[2:3]
	; CHECK-NEXT: s_cbranch_execnz .LBB0_9			; CHECK-NEXT: s_cbranch_execnz .LBB0_9
	; CHECK-NEXT: s_branch .LBB0_10			; CHECK-NEXT: s_branch .LBB0_10
	; CHECK-NEXT: .LBB0_14: ; %cond.false.i8			; CHECK-NEXT: .LBB0_14: ; %cond.false.i8
	; CHECK-NEXT: s_mov_b64 s[2:3], -1			; CHECK-NEXT: s_mov_b64 s[2:3], -1
	; CHECK-NEXT: s_mov_b64 s[0:1], 0
	; CHECK-NEXT: s_trap 2			; CHECK-NEXT: s_trap 2
	; CHECK-NEXT: s_branch .LBB0_4			; CHECK-NEXT: s_branch .LBB0_4
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%cmp = icmp eq i32 %n, 256			%cmp = icmp eq i32 %n, 256
	br i1 %cmp, label %if.then, label %if.else			br i1 %cmp, label %if.then, label %if.else

	if.then:			if.then:
	Show All 27 Lines

llvm/test/CodeGen/AMDGPU/si-unify-exit-return-unreachable.ll

	Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; GCN-NEXT: .LBB0_7: ; %Flow			; GCN-NEXT: .LBB0_7: ; %Flow
	; GCN-NEXT: s_andn2_b64 vcc, exec, s[8:9]			; GCN-NEXT: s_andn2_b64 vcc, exec, s[8:9]
	; GCN-NEXT: s_cbranch_vccnz .LBB0_2			; GCN-NEXT: s_cbranch_vccnz .LBB0_2
	; GCN-NEXT: .LBB0_8: ; %LeafBlock			; GCN-NEXT: .LBB0_8: ; %LeafBlock
	; GCN-NEXT: s_cmp_eq_u32 s10, 0			; GCN-NEXT: s_cmp_eq_u32 s10, 0
	; GCN-NEXT: s_cbranch_scc1 .LBB0_10			; GCN-NEXT: s_cbranch_scc1 .LBB0_10
	; GCN-NEXT: ; %bb.9:			; GCN-NEXT: ; %bb.9:
	; GCN-NEXT: s_mov_b64 s[6:7], -1			; GCN-NEXT: s_mov_b64 s[6:7], -1
	; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], s[6:7]			; GCN-NEXT: s_and_saveexec_b64 s[8:9], s[6:7]
	; GCN-NEXT: s_cbranch_execnz .LBB0_3			; GCN-NEXT: s_cbranch_execnz .LBB0_3
	; GCN-NEXT: s_branch .LBB0_4			; GCN-NEXT: s_branch .LBB0_4
	; GCN-NEXT: .LBB0_10: ; %NodeBlock7			; GCN-NEXT: .LBB0_10: ; %NodeBlock7
	; GCN-NEXT: v_cmp_lt_i32_e32 vcc, 1, v0			; GCN-NEXT: v_cmp_lt_i32_e32 vcc, 1, v0
	; GCN-NEXT: s_mov_b64 s[8:9], 0			; GCN-NEXT: s_mov_b64 s[8:9], 0
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
	Show All 16 Lines
	; GCN-NEXT: ; %bb.14: ; %Flow14			; GCN-NEXT: ; %bb.14: ; %Flow14
	; GCN-NEXT: s_or_b64 exec, exec, s[10:11]			; GCN-NEXT: s_or_b64 exec, exec, s[10:11]
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: s_and_saveexec_b64 s[10:11], s[8:9]			; GCN-NEXT: s_and_saveexec_b64 s[10:11], s[8:9]
	; GCN-NEXT: s_cbranch_execz .LBB0_18			; GCN-NEXT: s_cbranch_execz .LBB0_18
	; GCN-NEXT: ; %bb.15: ; %LeafBlock9			; GCN-NEXT: ; %bb.15: ; %LeafBlock9
	; GCN-NEXT: v_cmp_lt_i32_e32 vcc, 1, v0			; GCN-NEXT: v_cmp_lt_i32_e32 vcc, 1, v0
	; GCN-NEXT: s_mov_b64 s[8:9], -1			; GCN-NEXT: s_mov_b64 s[8:9], -1
	; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: s_and_saveexec_b64 s[12:13], vcc			; GCN-NEXT: s_and_saveexec_b64 s[12:13], vcc
	; GCN-NEXT: ; %bb.16: ; %do.body.i.i.i.i			; GCN-NEXT: ; %bb.16: ; %do.body.i.i.i.i
	; GCN-NEXT: s_mov_b64 s[4:5], exec			; GCN-NEXT: s_mov_b64 s[4:5], exec
	; GCN-NEXT: s_xor_b64 s[8:9], exec, -1			; GCN-NEXT: s_xor_b64 s[8:9], exec, -1
	; GCN-NEXT: ; %bb.17: ; %Flow16			; GCN-NEXT: ; %bb.17: ; %Flow16
	; GCN-NEXT: s_or_b64 exec, exec, s[12:13]			; GCN-NEXT: s_or_b64 exec, exec, s[12:13]
	; GCN-NEXT: s_and_b64 s[4:5], s[4:5], exec			; GCN-NEXT: s_and_b64 s[4:5], s[4:5], exec
	; GCN-NEXT: s_andn2_b64 s[6:7], s[6:7], exec			; GCN-NEXT: s_andn2_b64 s[6:7], s[6:7], exec
	Show All 38 Lines

llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll

	Show All 28 Lines
	; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0			; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0
	; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_lo offset:8 glc			; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_lo offset:8 glc
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_movk_i32 s0, 0xffc			; FLATSCR-NEXT: s_movk_i32 s0, 0xffc
	; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0			; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
	; FLATSCR-NEXT: scratch_store_dword off, v0, s0 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v0, s0 ; 4-byte Folded Spill
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: s_movk_i32 s0, 0xffc
	; FLATSCR-NEXT: scratch_load_dword v0, off, s0 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v0, off, s0 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi offset:8			; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi offset:8
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_endpgm			; FLATSCR-NEXT: s_endpgm
	entry:			entry:
	; Occupy 4092 bytes of scratch, so the offset of the spill of %a just fits in			; Occupy 4092 bytes of scratch, so the offset of the spill of %a just fits in
	; the instruction offset field.			; the instruction offset field.
	Show All 20 Lines
	; MUBUF-NEXT: s_add_u32 s0, s0, s7			; MUBUF-NEXT: s_add_u32 s0, s0, s7
	; MUBUF-NEXT: s_addc_u32 s1, s1, 0			; MUBUF-NEXT: s_addc_u32 s1, s1, 0
	; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc			; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_mov_b32 s4, 0x40000			; MUBUF-NEXT: s_mov_b32 s4, 0x40000
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill
	; MUBUF-NEXT: ;;#ASMSTART			; MUBUF-NEXT: ;;#ASMSTART
	; MUBUF-NEXT: ;;#ASMEND			; MUBUF-NEXT: ;;#ASMEND
	; MUBUF-NEXT: s_mov_b32 s4, 0x40000
	; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:8
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_endpgm			; MUBUF-NEXT: s_endpgm
	;			;
	; FLATSCR-LABEL: test_sgpr_offset_kernel:			; FLATSCR-LABEL: test_sgpr_offset_kernel:
	; FLATSCR: ; %bb.0: ; %entry			; FLATSCR: ; %bb.0: ; %entry
	; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0			; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0
	; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_lo offset:8 glc			; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_lo offset:8 glc
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_movk_i32 s0, 0x1000			; FLATSCR-NEXT: s_movk_i32 s0, 0x1000
	; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0			; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
	; FLATSCR-NEXT: scratch_store_dword off, v0, s0 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v0, s0 ; 4-byte Folded Spill
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: s_movk_i32 s0, 0x1000
	; FLATSCR-NEXT: scratch_load_dword v0, off, s0 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v0, off, s0 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi offset:8			; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi offset:8
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_endpgm			; FLATSCR-NEXT: s_endpgm
	entry:			entry:
	; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not			; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
	; fit in the instruction, and has to live in the SGPR offset.			; fit in the instruction, and has to live in the SGPR offset.
	▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	; FLATSCR-NEXT: s_movk_i32 s8, 0x1004			; FLATSCR-NEXT: s_movk_i32 s8, 0x1004
	; FLATSCR-NEXT: scratch_store_dword off, v0, s8 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v0, s8 ; 4-byte Folded Spill
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: s_movk_i32 s8, 0x1004
	; FLATSCR-NEXT: scratch_load_dword v0, off, s8 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v0, off, s8 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: s_endpgm			; FLATSCR-NEXT: s_endpgm
	entry:			entry:
	; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not			; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
	; fit in the instruction, and has to live in the SGPR offset.			; fit in the instruction, and has to live in the SGPR offset.
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_movk_i32 s0, 0xff8			; FLATSCR-NEXT: s_movk_i32 s0, 0xff8
	; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0			; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s0 ; 8-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s0 ; 8-byte Folded Spill
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_hi offset:8 glc			; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_hi offset:8 glc
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_movk_i32 s0, 0xff8
	; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, s0 ; 8-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, s0 ; 8-byte Folded Reload
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ; v[0:1]			; FLATSCR-NEXT: ; v[0:1]
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: s_endpgm			; FLATSCR-NEXT: s_endpgm
	entry:			entry:
	; Occupy 4088 bytes of scratch, so that the spill of the last subreg of %a			; Occupy 4088 bytes of scratch, so that the spill of the last subreg of %a
	Show All 30 Lines
	; MUBUF-NEXT: s_mov_b32 s4, 0x3ff00			; MUBUF-NEXT: s_mov_b32 s4, 0x3ff00
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s4 offset:4 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s4 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: ;;#ASMSTART			; MUBUF-NEXT: ;;#ASMSTART
	; MUBUF-NEXT: ;;#ASMEND			; MUBUF-NEXT: ;;#ASMEND
	; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc			; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_mov_b32 s4, 0x3ff00
	; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v0, off, s[0:3], s4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_nop 0			; MUBUF-NEXT: s_nop 0
	; MUBUF-NEXT: buffer_load_dword v1, off, s[0:3], s4 offset:4 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v1, off, s[0:3], s4 offset:4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: ;;#ASMSTART			; MUBUF-NEXT: ;;#ASMSTART
	; MUBUF-NEXT: ; v[0:1]			; MUBUF-NEXT: ; v[0:1]
	; MUBUF-NEXT: ;;#ASMEND			; MUBUF-NEXT: ;;#ASMEND
	; MUBUF-NEXT: s_endpgm			; MUBUF-NEXT: s_endpgm
	;			;
	; FLATSCR-LABEL: test_inst_offset_subregs_kernel:			; FLATSCR-LABEL: test_inst_offset_subregs_kernel:
	; FLATSCR: ; %bb.0: ; %entry			; FLATSCR: ; %bb.0: ; %entry
	; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0			; FLATSCR-NEXT: s_mov_b32 vcc_lo, 0
	; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, vcc_lo offset:12 glc			; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, vcc_lo offset:12 glc
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_movk_i32 s0, 0xffc			; FLATSCR-NEXT: s_movk_i32 s0, 0xffc
	; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0			; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s0 ; 8-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s0 ; 8-byte Folded Spill
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_hi offset:8 glc			; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_hi offset:8 glc
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_movk_i32 s0, 0xffc
	; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, s0 ; 8-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, s0 ; 8-byte Folded Reload
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: ;;#ASMSTART			; FLATSCR-NEXT: ;;#ASMSTART
	; FLATSCR-NEXT: ; v[0:1]			; FLATSCR-NEXT: ; v[0:1]
	; FLATSCR-NEXT: ;;#ASMEND			; FLATSCR-NEXT: ;;#ASMEND
	; FLATSCR-NEXT: s_endpgm			; FLATSCR-NEXT: s_endpgm
	entry:			entry:
	; Occupy 4092 bytes of scratch, so that the spill of the last subreg of %a			; Occupy 4092 bytes of scratch, so that the spill of the last subreg of %a
	▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,991 Lines • ▼ Show 20 Lines
	; GFX6-NEXT: buffer_store_dword v17, off, s[40:43], s2 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v17, off, s[40:43], s2 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: buffer_store_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Spill
	; GFX6-NEXT: buffer_store_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill			; GFX6-NEXT: buffer_store_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Spill
	; GFX6-NEXT: s_waitcnt expcnt(0)			; GFX6-NEXT: s_waitcnt expcnt(0)
	; GFX6-NEXT: ;;#ASMSTART			; GFX6-NEXT: ;;#ASMSTART
	; GFX6-NEXT: ;;#ASMEND			; GFX6-NEXT: ;;#ASMEND
	; GFX6-NEXT: s_mov_b32 s2, 0x84800
	; GFX6-NEXT: buffer_load_dword v17, off, s[40:43], s2 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v17, off, s[40:43], s2 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v18, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v19, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v20, off, s[40:43], s2 offset:12 ; 4-byte Folded Reload
	; GFX6-NEXT: s_mov_b32 s2, 0x84000			; GFX6-NEXT: s_mov_b32 s2, 0x84000
	; GFX6-NEXT: buffer_load_dword v13, off, s[40:43], s2 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v13, off, s[40:43], s2 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v14, off, s[40:43], s2 offset:4 ; 4-byte Folded Reload
	; GFX6-NEXT: buffer_load_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload			; GFX6-NEXT: buffer_load_dword v15, off, s[40:43], s2 offset:8 ; 4-byte Folded Reload
	▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[0:3], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[0:3], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[16:19], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[16:19], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[20:23], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[20:23], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2100			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2100
	; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[8:11], s0 ; 16-byte Folded Spill			; GFX9-FLATSCR-NEXT: scratch_store_dwordx4 off, v[8:11], s0 ; 16-byte Folded Spill
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x2100			; GFX9-FLATSCR-NEXT: s_nop 0
	; GFX9-FLATSCR-NEXT: ;;#ASMSTART			; GFX9-FLATSCR-NEXT: ;;#ASMSTART
	; GFX9-FLATSCR-NEXT: ;;#ASMEND			; GFX9-FLATSCR-NEXT: ;;#ASMEND
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[8:11], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[8:11], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20f0
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[20:23], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[20:23], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20e0
	; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[16:19], off, s0 ; 16-byte Folded Reload			; GFX9-FLATSCR-NEXT: scratch_load_dwordx4 v[16:19], off, s0 ; 16-byte Folded Reload
	; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0			; GFX9-FLATSCR-NEXT: s_movk_i32 s0, 0x20d0
	▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v13, v38			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v13, v38
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v14, v39			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v14, v39
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v15, v40			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v15, v40
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v33, v58			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v33, v58
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v34, v59			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v34, v59
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v35, v60			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v35, v60
	; GFX10-FLATSCR-NEXT: ;;#ASMSTART			; GFX10-FLATSCR-NEXT: ;;#ASMSTART
	; GFX10-FLATSCR-NEXT: ;;#ASMEND			; GFX10-FLATSCR-NEXT: ;;#ASMEND
	; GFX10-FLATSCR-NEXT: s_movk_i32 s0, 0x2010
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v36, v65			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v36, v65
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v37, v66			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v37, v66
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v38, v67			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v38, v67
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v39, v68			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v39, v68
	; GFX10-FLATSCR-NEXT: scratch_load_dwordx4 v[64:67], off, s0 ; 16-byte Folded Reload			; GFX10-FLATSCR-NEXT: scratch_load_dwordx4 v[64:67], off, s0 ; 16-byte Folded Reload
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v60, v89			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v60, v89
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v56, v85			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v56, v85
	; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v52, v81			; GFX10-FLATSCR-NEXT: v_mov_b32_e32 v52, v81
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/O3-pipeline.ll

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: PostRA Machine Sink			; CHECK-NEXT: PostRA Machine Sink
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: Shrink Wrapping analysis			; CHECK-NEXT: Shrink Wrapping analysis
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; CHECK-NEXT: Machine Late Instructions Cleanup Pass
	; CHECK-NEXT: Control Flow Optimizer			; CHECK-NEXT: Control Flow Optimizer
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Tail Duplication			; CHECK-NEXT: Tail Duplication
	; CHECK-NEXT: Machine Copy Propagation Pass			; CHECK-NEXT: Machine Copy Propagation Pass
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: ARM load / store optimization pass			; CHECK-NEXT: ARM load / store optimization pass
	; CHECK-NEXT: ReachingDefAnalysis			; CHECK-NEXT: ReachingDefAnalysis
	; CHECK-NEXT: ARM Execution Domain Fix			; CHECK-NEXT: ARM Execution Domain Fix
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll

	Show First 20 Lines • Show All 1,646 Lines • ▼ Show 20 Lines
	; ARM-DISABLE-NEXT: LBB11_5: @ %end			; ARM-DISABLE-NEXT: LBB11_5: @ %end
	; ARM-DISABLE-NEXT: bx lr			; ARM-DISABLE-NEXT: bx lr
	;			;
	; THUMB-ENABLE-LABEL: infiniteloop3:			; THUMB-ENABLE-LABEL: infiniteloop3:
	; THUMB-ENABLE: @ %bb.0: @ %entry			; THUMB-ENABLE: @ %bb.0: @ %entry
	; THUMB-ENABLE-NEXT: movs r0, #0			; THUMB-ENABLE-NEXT: movs r0, #0
	; THUMB-ENABLE-NEXT: cbnz r0, LBB11_5			; THUMB-ENABLE-NEXT: cbnz r0, LBB11_5
	; THUMB-ENABLE-NEXT: @ %bb.1: @ %loop2a.preheader			; THUMB-ENABLE-NEXT: @ %bb.1: @ %loop2a.preheader
	; THUMB-ENABLE-NEXT: movs r0, #0
	; THUMB-ENABLE-NEXT: movs r1, #0			; THUMB-ENABLE-NEXT: movs r1, #0
	; THUMB-ENABLE-NEXT: mov r2, r0			; THUMB-ENABLE-NEXT: mov r2, r0
	; THUMB-ENABLE-NEXT: b LBB11_3			; THUMB-ENABLE-NEXT: b LBB11_3
	; THUMB-ENABLE-NEXT: LBB11_2: @ %loop2b			; THUMB-ENABLE-NEXT: LBB11_2: @ %loop2b
	; THUMB-ENABLE-NEXT: @ in Loop: Header=BB11_3 Depth=1			; THUMB-ENABLE-NEXT: @ in Loop: Header=BB11_3 Depth=1
	; THUMB-ENABLE-NEXT: str r1, [r2]			; THUMB-ENABLE-NEXT: str r1, [r2]
	; THUMB-ENABLE-NEXT: mov r2, r1			; THUMB-ENABLE-NEXT: mov r2, r1
	; THUMB-ENABLE-NEXT: mov r1, r3			; THUMB-ENABLE-NEXT: mov r1, r3
	Show All 10 Lines
	; THUMB-ENABLE-NEXT: LBB11_5: @ %end			; THUMB-ENABLE-NEXT: LBB11_5: @ %end
	; THUMB-ENABLE-NEXT: bx lr			; THUMB-ENABLE-NEXT: bx lr
	;			;
	; THUMB-DISABLE-LABEL: infiniteloop3:			; THUMB-DISABLE-LABEL: infiniteloop3:
	; THUMB-DISABLE: @ %bb.0: @ %entry			; THUMB-DISABLE: @ %bb.0: @ %entry
	; THUMB-DISABLE-NEXT: movs r0, #0			; THUMB-DISABLE-NEXT: movs r0, #0
	; THUMB-DISABLE-NEXT: cbnz r0, LBB11_5			; THUMB-DISABLE-NEXT: cbnz r0, LBB11_5
	; THUMB-DISABLE-NEXT: @ %bb.1: @ %loop2a.preheader			; THUMB-DISABLE-NEXT: @ %bb.1: @ %loop2a.preheader
	; THUMB-DISABLE-NEXT: movs r0, #0
	; THUMB-DISABLE-NEXT: movs r1, #0			; THUMB-DISABLE-NEXT: movs r1, #0
	; THUMB-DISABLE-NEXT: mov r2, r0			; THUMB-DISABLE-NEXT: mov r2, r0
	; THUMB-DISABLE-NEXT: b LBB11_3			; THUMB-DISABLE-NEXT: b LBB11_3
	; THUMB-DISABLE-NEXT: LBB11_2: @ %loop2b			; THUMB-DISABLE-NEXT: LBB11_2: @ %loop2b
	; THUMB-DISABLE-NEXT: @ in Loop: Header=BB11_3 Depth=1			; THUMB-DISABLE-NEXT: @ in Loop: Header=BB11_3 Depth=1
	; THUMB-DISABLE-NEXT: str r1, [r2]			; THUMB-DISABLE-NEXT: str r1, [r2]
	; THUMB-DISABLE-NEXT: mov r2, r1			; THUMB-DISABLE-NEXT: mov r2, r1
	; THUMB-DISABLE-NEXT: mov r1, r3			; THUMB-DISABLE-NEXT: mov r1, r3
	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/fpclamptosat.ll

	Show First 20 Lines • Show All 3,758 Lines • ▼ Show 20 Lines
	; SOFT-NEXT: mvns r6, r3			; SOFT-NEXT: mvns r6, r3
	; SOFT-NEXT: ldr r0, .LCPI48_0			; SOFT-NEXT: ldr r0, .LCPI48_0
	; SOFT-NEXT: cmp r4, r0			; SOFT-NEXT: cmp r4, r0
	; SOFT-NEXT: ldr r3, [sp, #16] @ 4-byte Reload			; SOFT-NEXT: ldr r3, [sp, #16] @ 4-byte Reload
	; SOFT-NEXT: blo .LBB48_19			; SOFT-NEXT: blo .LBB48_19
	; SOFT-NEXT: @ %bb.18: @ %entry			; SOFT-NEXT: @ %bb.18: @ %entry
	; SOFT-NEXT: mov r3, r6			; SOFT-NEXT: mov r3, r6
	; SOFT-NEXT: .LBB48_19: @ %entry			; SOFT-NEXT: .LBB48_19: @ %entry
	; SOFT-NEXT: ldr r0, .LCPI48_0
	; SOFT-NEXT: cmp r4, r0			; SOFT-NEXT: cmp r4, r0
	; SOFT-NEXT: ldr r4, [sp, #16] @ 4-byte Reload			; SOFT-NEXT: ldr r4, [sp, #16] @ 4-byte Reload
	; SOFT-NEXT: beq .LBB48_21			; SOFT-NEXT: beq .LBB48_21
	; SOFT-NEXT: @ %bb.20: @ %entry			; SOFT-NEXT: @ %bb.20: @ %entry
	; SOFT-NEXT: mov r4, r3			; SOFT-NEXT: mov r4, r3
	; SOFT-NEXT: .LBB48_21: @ %entry			; SOFT-NEXT: .LBB48_21: @ %entry
	; SOFT-NEXT: cmp r7, #0			; SOFT-NEXT: cmp r7, #0
	; SOFT-NEXT: bmi .LBB48_23			; SOFT-NEXT: bmi .LBB48_23
	▲ Show 20 Lines • Show All 566 Lines • ▼ Show 20 Lines
	; SOFT-NEXT: mvns r6, r3			; SOFT-NEXT: mvns r6, r3
	; SOFT-NEXT: ldr r0, .LCPI51_0			; SOFT-NEXT: ldr r0, .LCPI51_0
	; SOFT-NEXT: cmp r4, r0			; SOFT-NEXT: cmp r4, r0
	; SOFT-NEXT: ldr r3, [sp, #16] @ 4-byte Reload			; SOFT-NEXT: ldr r3, [sp, #16] @ 4-byte Reload
	; SOFT-NEXT: blo .LBB51_19			; SOFT-NEXT: blo .LBB51_19
	; SOFT-NEXT: @ %bb.18: @ %entry			; SOFT-NEXT: @ %bb.18: @ %entry
	; SOFT-NEXT: mov r3, r6			; SOFT-NEXT: mov r3, r6
	; SOFT-NEXT: .LBB51_19: @ %entry			; SOFT-NEXT: .LBB51_19: @ %entry
	; SOFT-NEXT: ldr r0, .LCPI51_0
	; SOFT-NEXT: cmp r4, r0			; SOFT-NEXT: cmp r4, r0
	; SOFT-NEXT: ldr r4, [sp, #16] @ 4-byte Reload			; SOFT-NEXT: ldr r4, [sp, #16] @ 4-byte Reload
	; SOFT-NEXT: beq .LBB51_21			; SOFT-NEXT: beq .LBB51_21
	; SOFT-NEXT: @ %bb.20: @ %entry			; SOFT-NEXT: @ %bb.20: @ %entry
	; SOFT-NEXT: mov r4, r3			; SOFT-NEXT: mov r4, r3
	; SOFT-NEXT: .LBB51_21: @ %entry			; SOFT-NEXT: .LBB51_21: @ %entry
	; SOFT-NEXT: cmp r7, #0			; SOFT-NEXT: cmp r7, #0
	; SOFT-NEXT: bmi .LBB51_23			; SOFT-NEXT: bmi .LBB51_23
	▲ Show 20 Lines • Show All 987 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/ifcvt-branch-weight-bug.ll

	Show All 16 Lines
	; for.body -> lor.lhs.false.i (50%)			; for.body -> lor.lhs.false.i (50%)
	; -> for.cond.backedge (50%)			; -> for.cond.backedge (50%)
	; lor.lhs.false.i -> for.cond.backedge (100%)			; lor.lhs.false.i -> for.cond.backedge (100%)
	; -> cond.false.i (0%)			; -> cond.false.i (0%)
	; Afer if conversion, we have			; Afer if conversion, we have
	; for.body -> for.cond.backedge (100%)			; for.body -> for.cond.backedge (100%)
	; -> cond.false.i (0%)			; -> cond.false.i (0%)
	; CHECK: bb.1.for.body:			; CHECK: bb.1.for.body:
	; CHECK: successors: %bb.2(0x80000000), %bb.4(0x00000000)			; CHECK: successors: %bb.2(0x80000000), %bb.5(0x00000000)
	for.body:			for.body:
	br i1 undef, label %for.cond.backedge, label %lor.lhs.false.i, !prof !1			br i1 undef, label %for.cond.backedge, label %lor.lhs.false.i, !prof !1

	for.cond.backedge:			for.cond.backedge:
	%tobool = icmp eq %classL* %p0, null			%tobool = icmp eq %classL* %p0, null
	br i1 %tobool, label %for.end, label %for.body			br i1 %tobool, label %for.end, label %for.body

	lor.lhs.false.i:			lor.lhs.false.i:
	Show All 31 Lines

llvm/test/CodeGen/ARM/jump-table-islands.ll

	; RUN: llc -mtriple=armv7-apple-ios8.0 -o - %s \| FileCheck %s			; RUN: llc -mtriple=armv7-apple-ios8.0 -o - %s \| FileCheck %s

	%BigInt = type i5500			%BigInt = type i8500

	define %BigInt @test_moved_jumptable(i1 %tst, i32 %sw, %BigInt %l) {			define %BigInt @test_moved_jumptable(i1 %tst, i32 %sw, %BigInt %l) {
	; CHECK-LABEL: test_moved_jumptable:			; CHECK-LABEL: test_moved_jumptable:

	; CHECK: adr {{r[0-9]+}}, [[JUMP_TABLE:LJTI[0-9]+_[0-9]+]]			; CHECK: adr {{r[0-9]+}}, [[JUMP_TABLE:LJTI[0-9]+_[0-9]+]]
	; CHECK: b [[SKIP_TABLE:LBB[0-9]+_[0-9]+]]			; CHECK: b [[SKIP_TABLE:LBB[0-9]+_[0-9]+]]

	; CHECK: [[JUMP_TABLE]]:			; CHECK: [[JUMP_TABLE]]:
	Show All 29 Lines

llvm/test/CodeGen/ARM/reg_sequence.ll

	Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vadd.f32 [[Q1:q[0-9]+]], [[Q8]], [[Q8]]			; CHECK-NEXT: vadd.f32 [[Q1:q[0-9]+]], [[Q8]], [[Q8]]
	; CHECK-NEXT: vmul.f32 [[Q8]], [[Q9]], d1[0]			; CHECK-NEXT: vmul.f32 [[Q8]], [[Q9]], d1[0]
	; CHECK-NEXT: vmul.f32 [[Q8]], [[Q8]], [[Q8]]			; CHECK-NEXT: vmul.f32 [[Q8]], [[Q8]], [[Q8]]
	; CHECK-NEXT: vadd.f32 [[Q8]], [[Q8]], [[Q8]]			; CHECK-NEXT: vadd.f32 [[Q8]], [[Q8]], [[Q8]]
	; CHECK-NEXT: vmul.f32 [[Q8]], [[Q8]], [[Q8]]			; CHECK-NEXT: vmul.f32 [[Q8]], [[Q8]], [[Q8]]
	; CHECK-NEXT: vst1.32 {d17[1]}, [r0:32]			; CHECK-NEXT: vst1.32 {d17[1]}, [r0:32]
	; CHECK-NEXT: mov r0, #0			; CHECK-NEXT: mov r0, #0
	; CHECK-NEXT: cmp r0, #0			; CHECK-NEXT: cmp r0, #0
	; CHECK-NEXT: movne r0, #0
	; CHECK-NEXT: bxne lr			; CHECK-NEXT: bxne lr
	; CHECK-NEXT: LBB9_1:			; CHECK-NEXT: LBB9_1:
	; CHECK-NEXT: trap			; CHECK-NEXT: trap
	entry:			entry:
	%0 = shufflevector <4 x float> zeroinitializer, <4 x float> undef, <4 x i32> zeroinitializer ; <<4 x float>> [#uses=1]			%0 = shufflevector <4 x float> zeroinitializer, <4 x float> undef, <4 x i32> zeroinitializer ; <<4 x float>> [#uses=1]
	%1 = insertelement <4 x float> %0, float %x, i32 1 ; <<4 x float>> [#uses=1]			%1 = insertelement <4 x float> %0, float %x, i32 1 ; <<4 x float>> [#uses=1]
	%2 = insertelement <4 x float> %1, float %x, i32 2 ; <<4 x float>> [#uses=1]			%2 = insertelement <4 x float> %1, float %x, i32 2 ; <<4 x float>> [#uses=1]
	%3 = insertelement <4 x float> %2, float %x, i32 3 ; <<4 x float>> [#uses=1]			%3 = insertelement <4 x float> %2, float %x, i32 3 ; <<4 x float>> [#uses=1]
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/test/CodeGen/BPF/objdump_cond_op_2.ll

	; RUN: llc -march=bpfel -filetype=obj -o - %s \| llvm-objdump --no-print-imm-hex -d - \| FileCheck %s			; RUN: llc -march=bpfel -filetype=obj -o - %s \| llvm-objdump --no-print-imm-hex -d - \| FileCheck %s

	; Source Code:			; Source Code:
	; int test(int a, int b) {			; int test(int a, int b) {
	; int s = 0;			; int s = 0;
	; while (a < b) { s++; a += s; b -= s; }			; while (a < b) { s++; a += s; b -= s; }
	; return s;			; return s;
	; }			; }

	define i32 @test(i32, i32) local_unnamed_addr #0 {			define i32 @test(i32, i32) local_unnamed_addr #0 {
	; CHECK-LABEL: <test>:			; CHECK-LABEL: <test>:
	%3 = icmp slt i32 %0, %1			%3 = icmp slt i32 %0, %1
	br i1 %3, label %4, label %13			br i1 %3, label %4, label %13

	; <label>:4: ; preds = %2			; <label>:4: ; preds = %2
	br label %5			br label %5
	; CHECK: if r4 s>= r3 goto +11 <LBB0_3>			; CHECK: if r4 s>= r3 goto +10 <LBB0_2>
	; CHECK: r0 = 0			; CHECK-LABEL: <LBB0_1>:
	; CHECK-LABEL: <LBB0_2>:

	; <label>:5: ; preds = %4, %5			; <label>:5: ; preds = %4, %5
	%6 = phi i32 [ %9, %5 ], [ 0, %4 ]			%6 = phi i32 [ %9, %5 ], [ 0, %4 ]
	%7 = phi i32 [ %11, %5 ], [ %1, %4 ]			%7 = phi i32 [ %11, %5 ], [ %1, %4 ]
	%8 = phi i32 [ %10, %5 ], [ %0, %4 ]			%8 = phi i32 [ %10, %5 ], [ %0, %4 ]
	%9 = add nuw nsw i32 %6, 1			%9 = add nuw nsw i32 %6, 1
	%10 = add nsw i32 %9, %8			%10 = add nsw i32 %9, %8
	%11 = sub nsw i32 %7, %9			%11 = sub nsw i32 %7, %9
	%12 = icmp slt i32 %10, %11			%12 = icmp slt i32 %10, %11
	br i1 %12, label %5, label %13			br i1 %12, label %5, label %13
	; CHECK: r1 = r3			; CHECK: r1 = r3
	; CHECK: if r2 s> r3 goto -10 <LBB0_2>			; CHECK: if r2 s> r3 goto -10 <LBB0_1>

	; <label>:13: ; preds = %5, %2			; <label>:13: ; preds = %5, %2
	%14 = phi i32 [ 0, %2 ], [ %9, %5 ]			%14 = phi i32 [ 0, %2 ], [ %9, %5 ]
	ret i32 %14			ret i32 %14
	; CHECK-LABEL: <LBB0_3>:			; CHECK-LABEL: <LBB0_2>:
	; CHECK: exit			; CHECK: exit
	}			}
	attributes #0 = { norecurse nounwind readnone }			attributes #0 = { norecurse nounwind readnone }

llvm/test/CodeGen/M68k/pipeline.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: PostRA Machine Sink			; CHECK-NEXT: PostRA Machine Sink
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: Shrink Wrapping analysis			; CHECK-NEXT: Shrink Wrapping analysis
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; CHECK-NEXT: Machine Late Instructions Cleanup Pass
	; CHECK-NEXT: Control Flow Optimizer			; CHECK-NEXT: Control Flow Optimizer
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Tail Duplication			; CHECK-NEXT: Tail Duplication
	; CHECK-NEXT: Machine Copy Propagation Pass			; CHECK-NEXT: Machine Copy Propagation Pass
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: M68k pseudo instruction expansion pass			; CHECK-NEXT: M68k pseudo instruction expansion pass
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	Show All 17 Lines

llvm/test/CodeGen/Mips/llvm-ir/lshr.ll

	Show First 20 Lines • Show All 835 Lines • ▼ Show 20 Lines
	; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload			; MMR3-NEXT: lw $7, 20($sp) # 4-byte Folded Reload
	; MMR3-NEXT: movn $4, $9, $7			; MMR3-NEXT: movn $4, $9, $7
	; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload			; MMR3-NEXT: lw $6, 4($sp) # 4-byte Folded Reload
	; MMR3-NEXT: li16 $7, 0			; MMR3-NEXT: li16 $7, 0
	; MMR3-NEXT: movn $6, $7, $17			; MMR3-NEXT: movn $6, $7, $17
	; MMR3-NEXT: or16 $6, $4			; MMR3-NEXT: or16 $6, $4
	; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload			; MMR3-NEXT: lw $4, 8($sp) # 4-byte Folded Reload
	; MMR3-NEXT: movn $1, $7, $4			; MMR3-NEXT: movn $1, $7, $4
	; MMR3-NEXT: li16 $7, 0
	; MMR3-NEXT: movn $1, $6, $10			; MMR3-NEXT: movn $1, $6, $10
	; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload			; MMR3-NEXT: lw $4, 24($sp) # 4-byte Folded Reload
	; MMR3-NEXT: movz $1, $4, $16			; MMR3-NEXT: movz $1, $4, $16
	; MMR3-NEXT: movn $2, $7, $17			; MMR3-NEXT: movn $2, $7, $17
	; MMR3-NEXT: li16 $4, 0			; MMR3-NEXT: li16 $4, 0
	; MMR3-NEXT: movz $2, $4, $10			; MMR3-NEXT: movz $2, $4, $10
	; MMR3-NEXT: move $4, $1			; MMR3-NEXT: move $4, $1
	; MMR3-NEXT: lwp $16, 32($sp)			; MMR3-NEXT: lwp $16, 32($sp)
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/test/CodeGen/Mips/llvm-ir/shl.ll

	Show First 20 Lines • Show All 909 Lines • ▼ Show 20 Lines
	; MMR3-NEXT: lw $7, 24($sp) # 4-byte Folded Reload			; MMR3-NEXT: lw $7, 24($sp) # 4-byte Folded Reload
	; MMR3-NEXT: movn $3, $9, $7			; MMR3-NEXT: movn $3, $9, $7
	; MMR3-NEXT: lw $5, 8($sp) # 4-byte Folded Reload			; MMR3-NEXT: lw $5, 8($sp) # 4-byte Folded Reload
	; MMR3-NEXT: li16 $7, 0			; MMR3-NEXT: li16 $7, 0
	; MMR3-NEXT: movn $5, $7, $17			; MMR3-NEXT: movn $5, $7, $17
	; MMR3-NEXT: or16 $5, $3			; MMR3-NEXT: or16 $5, $3
	; MMR3-NEXT: lw $3, 12($sp) # 4-byte Folded Reload			; MMR3-NEXT: lw $3, 12($sp) # 4-byte Folded Reload
	; MMR3-NEXT: movn $8, $7, $3			; MMR3-NEXT: movn $8, $7, $3
	; MMR3-NEXT: li16 $7, 0
	; MMR3-NEXT: movn $8, $5, $10			; MMR3-NEXT: movn $8, $5, $10
	; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload			; MMR3-NEXT: lw $3, 28($sp) # 4-byte Folded Reload
	; MMR3-NEXT: movz $8, $3, $16			; MMR3-NEXT: movz $8, $3, $16
	; MMR3-NEXT: movn $6, $7, $17			; MMR3-NEXT: movn $6, $7, $17
	; MMR3-NEXT: li16 $3, 0			; MMR3-NEXT: li16 $3, 0
	; MMR3-NEXT: movz $6, $3, $10			; MMR3-NEXT: movz $6, $3, $10
	; MMR3-NEXT: move $3, $8			; MMR3-NEXT: move $3, $8
	; MMR3-NEXT: move $5, $6			; MMR3-NEXT: move $5, $6
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/O3-pipeline.ll

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: PostRA Machine Sink			; CHECK-NEXT: PostRA Machine Sink
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: Shrink Wrapping analysis			; CHECK-NEXT: Shrink Wrapping analysis
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; CHECK-NEXT: Machine Late Instructions Cleanup Pass
	; CHECK-NEXT: Control Flow Optimizer			; CHECK-NEXT: Control Flow Optimizer
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Tail Duplication			; CHECK-NEXT: Tail Duplication
	; CHECK-NEXT: Machine Copy Propagation Pass			; CHECK-NEXT: Machine Copy Propagation Pass
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	; CHECK-NEXT: Machine Natural Loop Construction			; CHECK-NEXT: Machine Natural Loop Construction
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	Show All 28 Lines

llvm/test/CodeGen/PowerPC/cgp-select.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O3 -mcpu=pwr9 -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s			; RUN: llc -O3 -mcpu=pwr9 -verify-machineinstrs -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s

	define dso_local void @wibble(ptr nocapture readonly %arg, i32 signext %arg1, ptr nocapture %arg2, ptr nocapture %arg3) {			define dso_local void @wibble(ptr nocapture readonly %arg, i32 signext %arg1, ptr nocapture %arg2, ptr nocapture %arg3) {
	; CHECK-LABEL: wibble:			; CHECK-LABEL: wibble:
	; CHECK: # %bb.0: # %bb			; CHECK: # %bb.0: # %bb
	; CHECK-NEXT: lfs 0, 0(3)			; CHECK-NEXT: lfs 0, 0(3)
	; CHECK-NEXT: li 7, 7			; CHECK-NEXT: li 7, 7
	; CHECK-NEXT: cmpwi 4, 2			; CHECK-NEXT: cmpwi 4, 2
	; CHECK-NEXT: xsaddsp 0, 0, 0			; CHECK-NEXT: xsaddsp 0, 0, 0
	; CHECK-NEXT: blt 0, .LBB0_5			; CHECK-NEXT: blt 0, .LBB0_5
	; CHECK-NEXT: # %bb.1: # %bb6			; CHECK-NEXT: # %bb.1: # %bb6
	; CHECK-NEXT: clrldi 4, 4, 32			; CHECK-NEXT: clrldi 4, 4, 32
	; CHECK-NEXT: li 7, 7
	; CHECK-NEXT: addi 4, 4, -1			; CHECK-NEXT: addi 4, 4, -1
	; CHECK-NEXT: mtctr 4			; CHECK-NEXT: mtctr 4
	; CHECK-NEXT: li 4, 8			; CHECK-NEXT: li 4, 8
	; CHECK-NEXT: b .LBB0_3			; CHECK-NEXT: b .LBB0_3
	; CHECK-NEXT: .p2align 5			; CHECK-NEXT: .p2align 5
	; CHECK-NEXT: .LBB0_2: # %bb11			; CHECK-NEXT: .LBB0_2: # %bb11
	; CHECK-NEXT: #			; CHECK-NEXT: #
	; CHECK-NEXT: iselgt 7, 4, 7			; CHECK-NEXT: iselgt 7, 4, 7
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/fast-isel-branch.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; AIX64-NEXT: std 0, 144(1)			; AIX64-NEXT: std 0, 144(1)
	; AIX64-NEXT: li 3, 0			; AIX64-NEXT: li 3, 0
	; AIX64-NEXT: stw 3, 124(1)			; AIX64-NEXT: stw 3, 124(1)
	; AIX64-NEXT: li 3, 0			; AIX64-NEXT: li 3, 0
	; AIX64-NEXT: stw 3, 120(1)			; AIX64-NEXT: stw 3, 120(1)
	; AIX64-NEXT: L..BB0_1: # %for.cond			; AIX64-NEXT: L..BB0_1: # %for.cond
	; AIX64-NEXT: #			; AIX64-NEXT: #
	; AIX64-NEXT: lwz 3, 120(1)			; AIX64-NEXT: lwz 3, 120(1)
	; AIX64-NEXT: ld 4, L..C0(2) # @x			; AIX64-NEXT: ld 4, L..C0(2)
	; AIX64-NEXT: lwz 4, 0(4)			; AIX64-NEXT: lwz 4, 0(4)
	; AIX64-NEXT: cmpw 3, 4			; AIX64-NEXT: cmpw 3, 4
	; AIX64-NEXT: bge 0, L..BB0_4			; AIX64-NEXT: bge 0, L..BB0_4
	; AIX64-NEXT: # %bb.2: # %for.body			; AIX64-NEXT: # %bb.2: # %for.body
	; AIX64-NEXT: #			; AIX64-NEXT: #
	; AIX64-NEXT: bl .foo[PR]			; AIX64-NEXT: bl .foo[PR]
	; AIX64-NEXT: nop			; AIX64-NEXT: nop
	; AIX64-NEXT: # %bb.3: # %for.inc			; AIX64-NEXT: # %bb.3: # %for.inc
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll

	Show First 20 Lines • Show All 612 Lines • ▼ Show 20 Lines
	; P8-NEXT: .cfi_offset lr, 16			; P8-NEXT: .cfi_offset lr, 16
	; P8-NEXT: .cfi_offset r30, -16			; P8-NEXT: .cfi_offset r30, -16
	; P8-NEXT: addis r3, r2, .LCPI13_0@toc@ha			; P8-NEXT: addis r3, r2, .LCPI13_0@toc@ha
	; P8-NEXT: xxlxor f3, f3, f3			; P8-NEXT: xxlxor f3, f3, f3
	; P8-NEXT: std r30, 112(r1) # 8-byte Folded Spill			; P8-NEXT: std r30, 112(r1) # 8-byte Folded Spill
	; P8-NEXT: lfs f0, .LCPI13_0@toc@l(r3)			; P8-NEXT: lfs f0, .LCPI13_0@toc@l(r3)
	; P8-NEXT: lis r3, -32768			; P8-NEXT: lis r3, -32768
	; P8-NEXT: fcmpo cr0, f2, f3			; P8-NEXT: fcmpo cr0, f2, f3
	; P8-NEXT: xxlxor f3, f3, f3
	; P8-NEXT: fcmpo cr1, f1, f0			; P8-NEXT: fcmpo cr1, f1, f0
	; P8-NEXT: crand 4cr5+lt, 4cr1+eq, lt			; P8-NEXT: crand 4cr5+lt, 4cr1+eq, lt
	; P8-NEXT: crandc 4cr5+gt, 4cr1+lt, 4*cr1+eq			; P8-NEXT: crandc 4cr5+gt, 4cr1+lt, 4*cr1+eq
	; P8-NEXT: cror 4cr5+lt, 4cr5+gt, 4*cr5+lt			; P8-NEXT: cror 4cr5+lt, 4cr5+gt, 4*cr5+lt
	; P8-NEXT: isel r30, 0, r3, 4*cr5+lt			; P8-NEXT: isel r30, 0, r3, 4*cr5+lt
	; P8-NEXT: bc 12, 4*cr5+lt, .LBB13_2			; P8-NEXT: bc 12, 4*cr5+lt, .LBB13_2
	; P8-NEXT: # %bb.1: # %entry			; P8-NEXT: # %bb.1: # %entry
	; P8-NEXT: fmr f3, f0			; P8-NEXT: fmr f3, f0
	Show All 25 Lines
	; P9-NEXT: std r30, -16(r1) # 8-byte Folded Spill			; P9-NEXT: std r30, -16(r1) # 8-byte Folded Spill
	; P9-NEXT: stdu r1, -48(r1)			; P9-NEXT: stdu r1, -48(r1)
	; P9-NEXT: addis r3, r2, .LCPI13_0@toc@ha			; P9-NEXT: addis r3, r2, .LCPI13_0@toc@ha
	; P9-NEXT: xxlxor f3, f3, f3			; P9-NEXT: xxlxor f3, f3, f3
	; P9-NEXT: std r0, 64(r1)			; P9-NEXT: std r0, 64(r1)
	; P9-NEXT: lfs f0, .LCPI13_0@toc@l(r3)			; P9-NEXT: lfs f0, .LCPI13_0@toc@l(r3)
	; P9-NEXT: fcmpo cr1, f2, f3			; P9-NEXT: fcmpo cr1, f2, f3
	; P9-NEXT: lis r3, -32768			; P9-NEXT: lis r3, -32768
	; P9-NEXT: xxlxor f3, f3, f3
	; P9-NEXT: fcmpo cr0, f1, f0			; P9-NEXT: fcmpo cr0, f1, f0
	; P9-NEXT: crand 4cr5+lt, eq, 4cr1+lt			; P9-NEXT: crand 4cr5+lt, eq, 4cr1+lt
	; P9-NEXT: crandc 4*cr5+gt, lt, eq			; P9-NEXT: crandc 4*cr5+gt, lt, eq
	; P9-NEXT: cror 4cr5+lt, 4cr5+gt, 4*cr5+lt			; P9-NEXT: cror 4cr5+lt, 4cr5+gt, 4*cr5+lt
	; P9-NEXT: isel r30, 0, r3, 4*cr5+lt			; P9-NEXT: isel r30, 0, r3, 4*cr5+lt
	; P9-NEXT: bc 12, 4*cr5+lt, .LBB13_2			; P9-NEXT: bc 12, 4*cr5+lt, .LBB13_2
	; P9-NEXT: # %bb.1: # %entry			; P9-NEXT: # %bb.1: # %entry
	; P9-NEXT: fmr f3, f0			; P9-NEXT: fmr f3, f0
	▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll

	Show First 20 Lines • Show All 1,289 Lines • ▼ Show 20 Lines
	; PC64LE-NEXT: std 30, -16(1) # 8-byte Folded Spill			; PC64LE-NEXT: std 30, -16(1) # 8-byte Folded Spill
	; PC64LE-NEXT: stdu 1, -48(1)			; PC64LE-NEXT: stdu 1, -48(1)
	; PC64LE-NEXT: addis 3, 2, .LCPI31_0@toc@ha			; PC64LE-NEXT: addis 3, 2, .LCPI31_0@toc@ha
	; PC64LE-NEXT: xxlxor 3, 3, 3			; PC64LE-NEXT: xxlxor 3, 3, 3
	; PC64LE-NEXT: std 0, 64(1)			; PC64LE-NEXT: std 0, 64(1)
	; PC64LE-NEXT: lfs 0, .LCPI31_0@toc@l(3)			; PC64LE-NEXT: lfs 0, .LCPI31_0@toc@l(3)
	; PC64LE-NEXT: lis 3, -32768			; PC64LE-NEXT: lis 3, -32768
	; PC64LE-NEXT: fcmpo 0, 2, 3			; PC64LE-NEXT: fcmpo 0, 2, 3
	; PC64LE-NEXT: xxlxor 3, 3, 3
	; PC64LE-NEXT: fcmpo 1, 1, 0			; PC64LE-NEXT: fcmpo 1, 1, 0
	; PC64LE-NEXT: crand 20, 6, 0			; PC64LE-NEXT: crand 20, 6, 0
	; PC64LE-NEXT: crandc 21, 4, 6			; PC64LE-NEXT: crandc 21, 4, 6
	; PC64LE-NEXT: cror 20, 21, 20			; PC64LE-NEXT: cror 20, 21, 20
	; PC64LE-NEXT: isel 30, 0, 3, 20			; PC64LE-NEXT: isel 30, 0, 3, 20
	; PC64LE-NEXT: bc 12, 20, .LBB31_2			; PC64LE-NEXT: bc 12, 20, .LBB31_2
	; PC64LE-NEXT: # %bb.1: # %entry			; PC64LE-NEXT: # %bb.1: # %entry
	; PC64LE-NEXT: fmr 3, 0			; PC64LE-NEXT: fmr 3, 0
	Show All 21 Lines
	; PC64LE9-NEXT: std 30, -16(1) # 8-byte Folded Spill			; PC64LE9-NEXT: std 30, -16(1) # 8-byte Folded Spill
	; PC64LE9-NEXT: stdu 1, -48(1)			; PC64LE9-NEXT: stdu 1, -48(1)
	; PC64LE9-NEXT: addis 3, 2, .LCPI31_0@toc@ha			; PC64LE9-NEXT: addis 3, 2, .LCPI31_0@toc@ha
	; PC64LE9-NEXT: xxlxor 3, 3, 3			; PC64LE9-NEXT: xxlxor 3, 3, 3
	; PC64LE9-NEXT: std 0, 64(1)			; PC64LE9-NEXT: std 0, 64(1)
	; PC64LE9-NEXT: lfs 0, .LCPI31_0@toc@l(3)			; PC64LE9-NEXT: lfs 0, .LCPI31_0@toc@l(3)
	; PC64LE9-NEXT: fcmpo 1, 2, 3			; PC64LE9-NEXT: fcmpo 1, 2, 3
	; PC64LE9-NEXT: lis 3, -32768			; PC64LE9-NEXT: lis 3, -32768
	; PC64LE9-NEXT: xxlxor 3, 3, 3
	; PC64LE9-NEXT: fcmpo 0, 1, 0			; PC64LE9-NEXT: fcmpo 0, 1, 0
	; PC64LE9-NEXT: crand 20, 2, 4			; PC64LE9-NEXT: crand 20, 2, 4
	; PC64LE9-NEXT: crandc 21, 0, 2			; PC64LE9-NEXT: crandc 21, 0, 2
	; PC64LE9-NEXT: cror 20, 21, 20			; PC64LE9-NEXT: cror 20, 21, 20
	; PC64LE9-NEXT: isel 30, 0, 3, 20			; PC64LE9-NEXT: isel 30, 0, 3, 20
	; PC64LE9-NEXT: bc 12, 20, .LBB31_2			; PC64LE9-NEXT: bc 12, 20, .LBB31_2
	; PC64LE9-NEXT: # %bb.1: # %entry			; PC64LE9-NEXT: # %bb.1: # %entry
	; PC64LE9-NEXT: fmr 3, 0			; PC64LE9-NEXT: fmr 3, 0
	▲ Show 20 Lines • Show All 770 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/frame-28.mir

This file was added.

				# RUN: llc -mtriple=s390x-linux-gnu -start-before=prologepilog %s -o - -mcpu=z14 \
				# RUN: -verify-machineinstrs 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				#
				# Test that redundant frame addressing anchor points are removed by
				# MachineLateInstrsCleanup.

				--- \|
				define void @fun1() { ret void }
				define void @fun2() { ret void }
				define void @fun3() { ret void }
				define void @fun4() { ret void }
				define void @fun5() { ret void }
				define void @fun6() { ret void }
				define void @fun7() { ret void }
				define void @fun8() { ret void }

				declare i32 @foo()

				@ptr = external dso_local local_unnamed_addr global ptr
				---

				# Test elimination of redundant LAYs in successor blocks.
				# CHECK-LABEL: fun1:
				# CHECK: lay %r1, 4096(%r15)
				# CHECK: # %bb.1:
				# CHECK-NOT: lay
				# CHECK: .LBB0_2:
				# CHECK-NOT: lay
				---
				name: fun1
				tracksRegLiveness: true
				stack:
				- { id: 0, size: 5000 }
				- { id: 1, size: 2500 }
				- { id: 2, size: 2500 }

				machineFunctionInfo: {}
				body: \|
				bb.0 (%ir-block.0):
				liveins: $f16d
				successors: %bb.2(0x00000001), %bb.1(0x7fffffff)

				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.1, 0, $noreg
				CHIMux undef $r0l, 3, implicit-def $cc
				BRC 14, 8, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				liveins: $f16d
				VST64 renamable $f16d, %stack.2, 0, $noreg
				J %bb.2

				bb.2:
				liveins: $f16d
				VST64 renamable $f16d, %stack.1, 0, $noreg
				Return
				...

				# In this function the LAY in bb.1 will have a different offset, so the first
				# LAY in bb.2 must remain.
				# CHECK-LABEL: fun2:
				# CHECK: lay %r1, 4096(%r15)
				# CHECK: # %bb.1:
				# CHECK: lay %r1, 8192(%r15)
				# CHECK: .LBB1_2:
				# CHECK: lay %r1, 4096(%r15)
				# CHECK-NOT: lay
				---
				name: fun2
				tracksRegLiveness: true
				stack:
				- { id: 0, size: 5000 }
				- { id: 1, size: 5000 }
				- { id: 2, size: 2500 }

				machineFunctionInfo: {}
				body: \|
				bb.0 (%ir-block.0):
				liveins: $f16d
				successors: %bb.2(0x00000001), %bb.1(0x7fffffff)

				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.1, 0, $noreg
				CHIMux undef $r0l, 3, implicit-def $cc
				BRC 14, 8, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				liveins: $f16d
				VST64 renamable $f16d, %stack.2, 0, $noreg
				J %bb.2

				bb.2:
				liveins: $f16d
				VST64 renamable $f16d, %stack.1, 0, $noreg
				VST64 renamable $f16d, %stack.1, 0, $noreg
				Return
				...

				# Test case with a loop (with room for improvement: since %r1 is not clobbered
				# inside the loop only the first LAY is needed).
				# CHECK-LABEL: fun3:
				# CHECK: lay %r1, 4096(%r15)
				# CHECK: .LBB2_1:
				# CHECK: lay %r1, 4096(%r15)
				# CHECK: .LBB2_2:
				# CHECK-NOT: lay %r1, 4096(%r15)
				---
				name: fun3
				tracksRegLiveness: true
				stack:
				- { id: 0, size: 5000 }
				- { id: 1, size: 2500 }
				- { id: 2, size: 2500 }

				machineFunctionInfo: {}
				body: \|
				bb.0 (%ir-block.0):
				liveins: $f16d
				successors: %bb.2(0x00000001), %bb.1(0x7fffffff)

				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.1, 0, $noreg
				CHIMux undef $r0l, 3, implicit-def $cc
				BRC 14, 8, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				liveins: $f16d
				successors: %bb.2(0x00000001), %bb.1(0x7fffffff)

				VST64 renamable $f16d, %stack.2, 0, $noreg
				CHIMux undef $r0l, 3, implicit-def $cc
				BRC 14, 8, %bb.1, implicit killed $cc
				J %bb.2

				bb.2:
				liveins: $f16d
				VST64 renamable $f16d, %stack.1, 0, $noreg
				Return
				...

				# Test case with a call which clobbers r1: the second LAY after the call is needed.
				# CHECK-LABEL: fun4:
				# CHECK: lay %r1, 4096(%r15)
				# CHECK: brasl
				# CHECK: lay %r1, 4096(%r15)
				---
				name: fun4
				tracksRegLiveness: true
				stack:
				- { id: 0, size: 5000 }
				- { id: 1, size: 2500 }

				machineFunctionInfo: {}
				body: \|
				bb.0 (%ir-block.0):
				liveins: $f16d

				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.1, 0, $noreg
				ADJCALLSTACKDOWN 0, 0
				CallBRASL @foo, csr_systemz_elf, implicit-def dead $r14d, implicit-def dead $cc, implicit $fpc, implicit-def $r2l
				ADJCALLSTACKUP 0, 0
				$f17d = IMPLICIT_DEF
				VST64 renamable $f17d, %stack.1, 0, $noreg
				Return
				...

				# Test case where index reg is loaded instead of using an LAY. Only one LGHI is needed.
				# CHECK-LABEL: fun5:
				# CHECK: lghi %r1, 4096
				# CHECK-NOT: lghi
				---
				name: fun5
				tracksRegLiveness: true
				stack:
				- { id: 0, size: 5000 }
				- { id: 1, size: 2500 }

				machineFunctionInfo: {}
				body: \|
				bb.0 (%ir-block.0):
				liveins: $f16d

				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				VST64 renamable $f16d, %stack.0, 0, $noreg
				$f0q = nofpexcept LXEB %stack.1, 0, $noreg, implicit $fpc
				$f1q = nofpexcept LXEB %stack.1, 0, $noreg, implicit $fpc
				Return
				...

				# Test where the constant is a Global. Only one LARL is needed.
				# CHECK-LABEL: fun6:
				# CHECK: larl %r1, ptr
				# CHECK-NOT: larl
				---
				name: fun6
				alignment: 16
				tracksRegLiveness: true
				tracksDebugUserValues: true
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				fixedStack:
				- { id: 0, offset: -160, size: 8, alignment: 8 }
				machineFunctionInfo: {}
				body: \|
				bb.0:
				successors: %bb.2(0x30000000), %bb.1(0x50000000)

				renamable $r1d = LARL @ptr
				CGHSI killed renamable $r1d, 0, 0, implicit-def $cc :: (volatile dereferenceable load (s64) from @ptr)
				BRC 14, 8, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				renamable $r1d = LARL @ptr
				MVGHI killed renamable $r1d, 0, 0

				bb.2:
				Return

				...

				# Load of an invariant location (GOT). Only one LGRL is needed.
				# CHECK-LABEL: fun7:
				# CHECK: lgrl %r1, ptr
				# CHECK-NOT: lgrl
				---
				name: fun7
				alignment: 16
				tracksRegLiveness: true
				tracksDebugUserValues: true
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				fixedStack:
				- { id: 0, offset: -160, size: 8, alignment: 8 }
				machineFunctionInfo: {}
				body: \|
				bb.0:
				successors: %bb.2(0x30000000), %bb.1(0x50000000)

				renamable $r1d = LGRL @ptr :: (load (s64) from got)
				CGHSI killed renamable $r1d, 0, 0, implicit-def $cc :: (volatile dereferenceable load (s64) from @ptr)
				BRC 14, 8, %bb.2, implicit killed $cc
				J %bb.1

				bb.1:
				renamable $r1d = LGRL @ptr :: (load (s64) from got)
				MVGHI killed renamable $r1d, 0, 0

				bb.2:
				Return

				...

				# Load from constant pool. Only one LARL is needed.
				# CHECK-LABEL: fun8:
				# CHECK: larl %r1, .LCPI7_0
				# CHECK-NOT: larl
				---
				name: fun8
				alignment: 16
				tracksRegLiveness: true
				tracksDebugUserValues: true
				liveins:
				- { reg: '$f0s' }
				frameInfo:
				maxAlignment: 1
				maxCallFrameSize: 0
				fixedStack:
				- { id: 0, offset: -160, size: 8, alignment: 8 }
				constants:
				- id: 0
				value: float 0x43E0000000000000
				alignment: 4
				machineFunctionInfo: {}
				body: \|
				bb.0 (%ir-block.0):
				successors: %bb.1, %bb.2
				liveins: $f0s

				renamable $r1d = LARL %const.0
				renamable $f1s = LE killed renamable $r1d, 0, $noreg :: (load (s32) from constant-pool)
				nofpexcept CEBR renamable $f0s, renamable $f1s, implicit-def $cc, implicit $fpc
				BRC 15, 11, %bb.2, implicit killed $cc

				bb.1:
				liveins: $f0s

				J %bb.3

				bb.2 (%ir-block.0):
				liveins: $f0s, $f1s

				renamable $r1d = LARL %const.0
				renamable $f1s = LE killed renamable $r1d, 0, $noreg :: (load (s32) from constant-pool)

				bb.3 (%ir-block.0):
				liveins: $r2d

				Return

				...

llvm/test/CodeGen/Thumb/frame-access.ll

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; CHECK: sub sp, #28			; CHECK: sub sp, #28
	; Incoming arguments area is accessed via SP if FP is not available			; Incoming arguments area is accessed via SP if FP is not available
	; CHECK-NOFP: add r0, sp, #36			; CHECK-NOFP: add r0, sp, #36
	; CHECK-NOFP: stm r0!, {r1, r2, r3}			; CHECK-NOFP: stm r0!, {r1, r2, r3}
	; CHECK-FP-ATPCS: mov r0, r7			; CHECK-FP-ATPCS: mov r0, r7
	; CHECK-FP-ATPCS: adds r0, #8			; CHECK-FP-ATPCS: adds r0, #8
	; CHECK-FP-ATPCS: stm r0!, {r1, r2, r3}			; CHECK-FP-ATPCS: stm r0!, {r1, r2, r3}
	; CHECK-FP-AAPCS: mov r0, r11			; CHECK-FP-AAPCS: mov r0, r11
	; CHECK-FP-AAPCS: str r1, [r0, #8]			; CHECK-FP-AAPCS: mov r7, r0
	; CHECK-FP-AAPCS: mov r0, r11			; CHECK-FP-AAPCS: adds r7, #8
	; CHECK-FP-AAPCS: str r2, [r0, #12]			; CHECK-FP-AAPCS: stm r7!, {r1, r2, r3}
	; CHECK-FP-AAPCS: mov r0, r11
	; CHECK-FP-AAPCS: str r3, [r0, #16]

	; Re-aligned stack, access via FP			; Re-aligned stack, access via FP
	; int test_args_realign(int a, int b, int c, int d, int e) {			; int test_args_realign(int a, int b, int c, int d, int e) {
	; __attribute__((aligned(16))) int v[4];			; __attribute__((aligned(16))) int v[4];
	; return g(v, a, b, c, d, e);			; return g(v, a, b, c, d, e);
	; }			; }
	; Function Attrs: nounwind			; Function Attrs: nounwind
	define dso_local i32 @test_args_realign(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) local_unnamed_addr {			define dso_local i32 @test_args_realign(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) local_unnamed_addr {
	entry:			entry:
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: lsrs r4, r4, #4			; CHECK-NEXT: lsrs r4, r4, #4
	; CHECK-NEXT: lsls r4, r4, #4			; CHECK-NEXT: lsls r4, r4, #4
	; CHECK-NEXT: mov sp, r4			; CHECK-NEXT: mov sp, r4
	; Incoming register varargs stored via FP			; Incoming register varargs stored via FP
	; CHECK-ATPCS: mov r0, r7			; CHECK-ATPCS: mov r0, r7
	; CHECK-ATPCS-NEXT: adds r0, #8			; CHECK-ATPCS-NEXT: adds r0, #8
	; CHECK-ATPCS-NEXT: stm r0!, {r1, r2, r3}			; CHECK-ATPCS-NEXT: stm r0!, {r1, r2, r3}
	; CHECK-AAPCS: mov r0, r11			; CHECK-AAPCS: mov r0, r11
	; CHECK-AAPCS: str r1, [r0, #8]			; CHECK-AAPCS: mov r7, r0
	; CHECK-AAPCS: mov r0, r11			; CHECK-AAPCS: adds r7, #8
	; CHECK-AAPCS: str r2, [r0, #12]			; CHECK-AAPCS: stm r7!, {r1, r2, r3}
	; CHECK-AAPCS: mov r0, r11
	; CHECK-AAPCS: str r3, [r0, #16]
	; VLAs present, access via FP			; VLAs present, access via FP
	; int test_args_vla(int a, int b, int c, int d, int e) {			; int test_args_vla(int a, int b, int c, int d, int e) {
	; int v[a];			; int v[a];
	; return g(v, a, b, c, d, e);			; return g(v, a, b, c, d, e);
	; }			; }
	define dso_local i32 @test_args_vla(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) local_unnamed_addr {			define dso_local i32 @test_args_vla(i32 %a, i32 %b, i32 %c, i32 %d, i32 %e) local_unnamed_addr {
	entry:			entry:
	%vla = alloca i32, i32 %a, align 4			%vla = alloca i32, i32 %a, align 4
	▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
	; CHECK-NOFP: mov r0, r6			; CHECK-NOFP: mov r0, r6
	; CHECK-NOFP-NEXT: adds r0, #36			; CHECK-NOFP-NEXT: adds r0, #36
	; CHECK-NOFP-NEXT: stm r0!, {r1, r2, r3}			; CHECK-NOFP-NEXT: stm r0!, {r1, r2, r3}
	; Incoming varargs stored via FP otherwise			; Incoming varargs stored via FP otherwise
	; CHECK-FP-ATPCS: mov r0, r7			; CHECK-FP-ATPCS: mov r0, r7
	; CHECK-FP-ATPCS-NEXT: adds r0, #8			; CHECK-FP-ATPCS-NEXT: adds r0, #8
	; CHECK-FP-ATPCS-NEXT: stm r0!, {r1, r2, r3}			; CHECK-FP-ATPCS-NEXT: stm r0!, {r1, r2, r3}
	; CHECK-FP-AAPCS: mov r0, r11			; CHECK-FP-AAPCS: mov r0, r11
	; CHECK-FP-AAPCS-NEXT: str r1, [r0, #8]			; CHECK-FP-AAPCS-NEXT: mov r5, r0
	; CHECK-FP-AAPCS-NEXT: mov r0, r11			; CHECK-FP-AAPCS-NEXT: adds r5, #8
	; CHECK-FP-AAPCS-NEXT: str r2, [r0, #12]			; CHECK-FP-AAPCS-NEXT: stm r5!, {r1, r2, r3}
	; CHECK-FP-AAPCS-NEXT: mov r0, r11
	; CHECK-FP-AAPCS-NEXT: str r3, [r0, #16]

	; struct S { int x[128]; } s;			; struct S { int x[128]; } s;
	; int test(S a, int b) {			; int test(S a, int b) {
	; return i(b);			; return i(b);
	; }			; }
	define dso_local i32 @test_args_large_offset(%struct.S* byval(%struct.S) align 4 %0, i32 %1) local_unnamed_addr {			define dso_local i32 @test_args_large_offset(%struct.S* byval(%struct.S) align 4 %0, i32 %1) local_unnamed_addr {
	%3 = alloca i32, align 4			%3 = alloca i32, align 4
	store i32 %1, i32* %3, align 4			store i32 %1, i32* %3, align 4
	▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-fpclamptosat_vec.ll

	Show First 20 Lines • Show All 1,884 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: cmp.w r8, #-2147483648			; CHECK-NEXT: cmp.w r8, #-2147483648
	; CHECK-NEXT: mov.w r8, #0			; CHECK-NEXT: mov.w r8, #0
	; CHECK-NEXT: ldr r2, [sp, #16] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #16] @ 4-byte Reload
	; CHECK-NEXT: csel r6, r5, r8, hi			; CHECK-NEXT: csel r6, r5, r8, hi
	; CHECK-NEXT: csel r6, r5, r6, eq			; CHECK-NEXT: csel r6, r5, r6, eq
	; CHECK-NEXT: cmp.w r2, #-1			; CHECK-NEXT: cmp.w r2, #-1
	; CHECK-NEXT: ldr r2, [sp, #8] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #8] @ 4-byte Reload
	; CHECK-NEXT: csel r5, r5, r8, gt			; CHECK-NEXT: csel r5, r5, r8, gt
	; CHECK-NEXT: mov.w r8, #0
	; CHECK-NEXT: cmp r2, #0			; CHECK-NEXT: cmp r2, #0
	; CHECK-NEXT: ldr r2, [sp, #4] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #4] @ 4-byte Reload
	; CHECK-NEXT: csel r5, r6, r5, eq			; CHECK-NEXT: csel r5, r6, r5, eq
	; CHECK-NEXT: cmp r1, r11			; CHECK-NEXT: cmp r1, r11
	; CHECK-NEXT: csel r1, lr, r3, lo			; CHECK-NEXT: csel r1, lr, r3, lo
	; CHECK-NEXT: csel r1, lr, r1, eq			; CHECK-NEXT: csel r1, lr, r1, eq
	; CHECK-NEXT: cmp r0, #0			; CHECK-NEXT: cmp r0, #0
	; CHECK-NEXT: csel r0, lr, r3, mi			; CHECK-NEXT: csel r0, lr, r3, mi
	▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: cmp.w r8, #-2147483648			; CHECK-NEXT: cmp.w r8, #-2147483648
	; CHECK-NEXT: mov.w r8, #0			; CHECK-NEXT: mov.w r8, #0
	; CHECK-NEXT: ldr r2, [sp, #20] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #20] @ 4-byte Reload
	; CHECK-NEXT: csel r6, r5, r8, hi			; CHECK-NEXT: csel r6, r5, r8, hi
	; CHECK-NEXT: csel r6, r5, r6, eq			; CHECK-NEXT: csel r6, r5, r6, eq
	; CHECK-NEXT: cmp.w r2, #-1			; CHECK-NEXT: cmp.w r2, #-1
	; CHECK-NEXT: ldr r2, [sp, #12] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #12] @ 4-byte Reload
	; CHECK-NEXT: csel r5, r5, r8, gt			; CHECK-NEXT: csel r5, r5, r8, gt
	; CHECK-NEXT: mov.w r8, #0
	; CHECK-NEXT: cmp r2, #0			; CHECK-NEXT: cmp r2, #0
	; CHECK-NEXT: ldr r2, [sp, #8] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #8] @ 4-byte Reload
	; CHECK-NEXT: csel r5, r6, r5, eq			; CHECK-NEXT: csel r5, r6, r5, eq
	; CHECK-NEXT: cmp r1, r10			; CHECK-NEXT: cmp r1, r10
	; CHECK-NEXT: csel r1, lr, r3, lo			; CHECK-NEXT: csel r1, lr, r3, lo
	; CHECK-NEXT: csel r1, lr, r1, eq			; CHECK-NEXT: csel r1, lr, r1, eq
	; CHECK-NEXT: cmp r0, #0			; CHECK-NEXT: cmp r0, #0
	; CHECK-NEXT: csel r0, lr, r3, mi			; CHECK-NEXT: csel r0, lr, r3, mi
	▲ Show 20 Lines • Show All 241 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: cmp.w r8, #-2147483648			; CHECK-NEXT: cmp.w r8, #-2147483648
	; CHECK-NEXT: mov.w r8, #0			; CHECK-NEXT: mov.w r8, #0
	; CHECK-NEXT: ldr r2, [sp, #16] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #16] @ 4-byte Reload
	; CHECK-NEXT: csel r6, r5, r8, hi			; CHECK-NEXT: csel r6, r5, r8, hi
	; CHECK-NEXT: csel r6, r5, r6, eq			; CHECK-NEXT: csel r6, r5, r6, eq
	; CHECK-NEXT: cmp.w r2, #-1			; CHECK-NEXT: cmp.w r2, #-1
	; CHECK-NEXT: ldr r2, [sp, #8] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #8] @ 4-byte Reload
	; CHECK-NEXT: csel r5, r5, r8, gt			; CHECK-NEXT: csel r5, r5, r8, gt
	; CHECK-NEXT: mov.w r8, #0
	; CHECK-NEXT: cmp r2, #0			; CHECK-NEXT: cmp r2, #0
	; CHECK-NEXT: ldr r2, [sp, #4] @ 4-byte Reload			; CHECK-NEXT: ldr r2, [sp, #4] @ 4-byte Reload
	; CHECK-NEXT: csel r5, r6, r5, eq			; CHECK-NEXT: csel r5, r6, r5, eq
	; CHECK-NEXT: cmp r1, r11			; CHECK-NEXT: cmp r1, r11
	; CHECK-NEXT: csel r1, lr, r3, lo			; CHECK-NEXT: csel r1, lr, r3, lo
	; CHECK-NEXT: csel r1, lr, r1, eq			; CHECK-NEXT: csel r1, lr, r1, eq
	; CHECK-NEXT: cmp r0, #0			; CHECK-NEXT: cmp r0, #0
	; CHECK-NEXT: csel r0, lr, r3, mi			; CHECK-NEXT: csel r0, lr, r3, mi
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/2008-04-09-BranchFolding.ll

	Show All 12 Lines
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movb $1, %al			; CHECK-NEXT: movb $1, %al
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: jne .LBB0_1			; CHECK-NEXT: jne .LBB0_1
	; CHECK-NEXT: # %bb.2: # %bb17.i			; CHECK-NEXT: # %bb.2: # %bb17.i
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	; CHECK-NEXT: .LBB0_1: # %bb160			; CHECK-NEXT: .LBB0_1: # %bb160
	; CHECK-NEXT: movb $1, %al
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	entry:			entry:
	%tmp3.i40 = icmp eq ptr null, null ; <i1> [#uses=2]			%tmp3.i40 = icmp eq ptr null, null ; <i1> [#uses=2]
	br label %bb140			br label %bb140
	bb140: ; preds = %entry			bb140: ; preds = %entry
	br i1 %tmp3.i40, label %bb160, label %bb17.i			br i1 %tmp3.i40, label %bb160, label %bb17.i
	Show All 33 Lines

llvm/test/CodeGen/X86/2008-04-16-ReMatBug.ll

	Show All 21 Lines
	; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %ecx			; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
	; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %ebx			; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %ebx
	; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %ebp			; CHECK-NEXT: movzwl {{[0-9]+}}(%esp), %ebp
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %edi
	; CHECK-NEXT: movw $-2, %si			; CHECK-NEXT: movw $-2, %si
	; CHECK-NEXT: jne LBB0_6			; CHECK-NEXT: jne LBB0_6
	; CHECK-NEXT: ## %bb.4: ## %bb37			; CHECK-NEXT: ## %bb.4: ## %bb37
	; CHECK-NEXT: movw $0, 40(%edi)			; CHECK-NEXT: movw $0, 40(%edi)
	; CHECK-NEXT: movb $1, %al
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: leal (,%ecx,4), %ecx			; CHECK-NEXT: leal (,%ecx,4), %ecx
	; CHECK-NEXT: leal (,%ebx,4), %edx			; CHECK-NEXT: leal (,%ebx,4), %edx
	; CHECK-NEXT: subl $12, %esp			; CHECK-NEXT: subl $12, %esp
	; CHECK-NEXT: movzwl %bp, %eax			; CHECK-NEXT: movzwl %bp, %eax
	; CHECK-NEXT: movswl %cx, %ecx			; CHECK-NEXT: movswl %cx, %ecx
	; CHECK-NEXT: movswl %dx, %edx			; CHECK-NEXT: movswl %dx, %edx
	; CHECK-NEXT: pushl $87			; CHECK-NEXT: pushl $87
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/AMX/amx-across-func.ll

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: tilestored %tmm2, 64(%rsp,%rax) # 1024-byte Folded Spill			; CHECK-NEXT: tilestored %tmm2, 64(%rsp,%rax) # 1024-byte Folded Spill
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: callq foo			; CHECK-NEXT: callq foo
	; CHECK-NEXT: ldtilecfg (%rsp)			; CHECK-NEXT: ldtilecfg (%rsp)
	; CHECK-NEXT: movl $buf+2048, %eax			; CHECK-NEXT: movl $buf+2048, %eax
	; CHECK-NEXT: tileloadd (%rax,%r14), %tmm0			; CHECK-NEXT: tileloadd (%rax,%r14), %tmm0
	; CHECK-NEXT: movabsq $64, %rcx			; CHECK-NEXT: movabsq $64, %rcx
	; CHECK-NEXT: tileloadd 1088(%rsp,%rcx), %tmm1 # 1024-byte Folded Reload			; CHECK-NEXT: tileloadd 1088(%rsp,%rcx), %tmm1 # 1024-byte Folded Reload
	; CHECK-NEXT: movabsq $64, %rcx
	; CHECK-NEXT: tileloadd 64(%rsp,%rcx), %tmm2 # 1024-byte Folded Reload			; CHECK-NEXT: tileloadd 64(%rsp,%rcx), %tmm2 # 1024-byte Folded Reload
	; CHECK-NEXT: tdpbssd %tmm2, %tmm1, %tmm0			; CHECK-NEXT: tdpbssd %tmm2, %tmm1, %tmm0
	; CHECK-NEXT: tilestored %tmm0, (%rax,%r14)			; CHECK-NEXT: tilestored %tmm0, (%rax,%r14)
	; CHECK-NEXT: addq $2120, %rsp # imm = 0x848			; CHECK-NEXT: addq $2120, %rsp # imm = 0x848
	; CHECK-NEXT: popq %rbx			; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: popq %r14			; CHECK-NEXT: popq %r14
	; CHECK-NEXT: popq %r15			; CHECK-NEXT: popq %r15
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
	▲ Show 20 Lines • Show All 562 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll

	Show All 40 Lines
	; CHECK-NEXT: movl $buf, %eax			; CHECK-NEXT: movl $buf, %eax
	; CHECK-NEXT: movw $8, %cx			; CHECK-NEXT: movw $8, %cx
	; CHECK-NEXT: tileloadd (%rax,%r14), %tmm0			; CHECK-NEXT: tileloadd (%rax,%r14), %tmm0
	; CHECK-NEXT: movl $buf+1024, %eax			; CHECK-NEXT: movl $buf+1024, %eax
	; CHECK-NEXT: tileloadd (%rax,%r14), %tmm1			; CHECK-NEXT: tileloadd (%rax,%r14), %tmm1
	; CHECK-NEXT: movabsq $64, %rax			; CHECK-NEXT: movabsq $64, %rax
	; CHECK-NEXT: tilestored %tmm5, 1088(%rsp,%rax) # 1024-byte Folded Spill			; CHECK-NEXT: tilestored %tmm5, 1088(%rsp,%rax) # 1024-byte Folded Spill
	; CHECK-NEXT: tdpbssd %tmm1, %tmm0, %tmm5			; CHECK-NEXT: tdpbssd %tmm1, %tmm0, %tmm5
	; CHECK-NEXT: movabsq $64, %rax
	; CHECK-NEXT: tilestored %tmm5, 64(%rsp,%rax) # 1024-byte Folded Spill			; CHECK-NEXT: tilestored %tmm5, 64(%rsp,%rax) # 1024-byte Folded Spill
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: callq foo			; CHECK-NEXT: callq foo
	; CHECK-NEXT: ldtilecfg (%rsp)			; CHECK-NEXT: ldtilecfg (%rsp)
	; CHECK-NEXT: movabsq $64, %rax			; CHECK-NEXT: movabsq $64, %rax
	; CHECK-NEXT: tileloadd 64(%rsp,%rax), %tmm6 # 1024-byte Folded Reload			; CHECK-NEXT: tileloadd 64(%rsp,%rax), %tmm6 # 1024-byte Folded Reload
	; CHECK-NEXT: jmp .LBB0_3			; CHECK-NEXT: jmp .LBB0_3
	; CHECK-NEXT: .LBB0_2: # %if.false			; CHECK-NEXT: .LBB0_2: # %if.false
	; CHECK-NEXT: movl $buf, %eax			; CHECK-NEXT: movl $buf, %eax
	; CHECK-NEXT: movw $8, %cx			; CHECK-NEXT: movw $8, %cx
	; CHECK-NEXT: tileloadd (%rax,%r14), %tmm2			; CHECK-NEXT: tileloadd (%rax,%r14), %tmm2
	; CHECK-NEXT: movl $buf+1024, %eax			; CHECK-NEXT: movl $buf+1024, %eax
	; CHECK-NEXT: tileloadd (%rax,%r14), %tmm3			; CHECK-NEXT: tileloadd (%rax,%r14), %tmm3
	; CHECK-NEXT: movabsq $64, %rax			; CHECK-NEXT: movabsq $64, %rax
	; CHECK-NEXT: tilestored %tmm5, 1088(%rsp,%rax) # 1024-byte Folded Spill			; CHECK-NEXT: tilestored %tmm5, 1088(%rsp,%rax) # 1024-byte Folded Spill
	; CHECK-NEXT: tdpbssd %tmm3, %tmm2, %tmm5			; CHECK-NEXT: tdpbssd %tmm3, %tmm2, %tmm5
	; CHECK-NEXT: movabsq $64, %rax
	; CHECK-NEXT: tilestored %tmm5, 64(%rsp,%rax) # 1024-byte Folded Spill			; CHECK-NEXT: tilestored %tmm5, 64(%rsp,%rax) # 1024-byte Folded Spill
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: callq foo			; CHECK-NEXT: callq foo
	; CHECK-NEXT: ldtilecfg (%rsp)			; CHECK-NEXT: ldtilecfg (%rsp)
	; CHECK-NEXT: movabsq $64, %rax			; CHECK-NEXT: movabsq $64, %rax
	; CHECK-NEXT: tileloadd 64(%rsp,%rax), %tmm6 # 1024-byte Folded Reload			; CHECK-NEXT: tileloadd 64(%rsp,%rax), %tmm6 # 1024-byte Folded Reload
	; CHECK-NEXT: tilestored %tmm6, (%r15,%r14)			; CHECK-NEXT: tilestored %tmm6, (%r15,%r14)
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fast-isel-stackcheck.ll

	; RUN: llc -o - %s \| FileCheck %s			; RUN: llc -o - %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx"			target triple = "x86_64-apple-macosx"

	; selectiondag stack protector uses a GuardReg which the fast-isel stack			; selectiondag stack protector uses a GuardReg which the fast-isel stack
	; protection code did not but the state was not reset properly.			; protection code did not but the state was not reset properly.
	; The optnone attribute on @bar forces fast-isel.			; The optnone attribute on @bar forces fast-isel.

	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: movq ___stack_chk_guard@GOTPCREL(%rip), %rax			; CHECK: movq ___stack_chk_guard@GOTPCREL(%rip), %rax
	; CHECK: movq ___stack_chk_guard@GOTPCREL(%rip), %rax
	define void @foo() #0 {			define void @foo() #0 {
	entry:			entry:
	%_tags = alloca [3 x i32], align 4			%_tags = alloca [3 x i32], align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: bar:			; CHECK-LABEL: bar:
	; CHECK: movq ___stack_chk_guard@GOTPCREL(%rip), %{{r.x}}			; CHECK: movq ___stack_chk_guard@GOTPCREL(%rip), %{{r.x}}
	Show All 27 Lines

llvm/test/CodeGen/X86/fshl.ll

	Show First 20 Lines • Show All 332 Lines • ▼ Show 20 Lines
	; X86-SLOW-NEXT: jne .LBB6_1			; X86-SLOW-NEXT: jne .LBB6_1
	; X86-SLOW-NEXT: # %bb.2:			; X86-SLOW-NEXT: # %bb.2:
	; X86-SLOW-NEXT: movl %ebp, %ecx			; X86-SLOW-NEXT: movl %ebp, %ecx
	; X86-SLOW-NEXT: movl %edi, %ebp			; X86-SLOW-NEXT: movl %edi, %ebp
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X86-SLOW-NEXT: movl %edx, %ebx			; X86-SLOW-NEXT: movl %edx, %ebx
	; X86-SLOW-NEXT: movl %esi, %edx			; X86-SLOW-NEXT: movl %esi, %edx
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-SLOW-NEXT: jmp .LBB6_3			; X86-SLOW-NEXT: testb $32, %al
				; X86-SLOW-NEXT: je .LBB6_5
				; X86-SLOW-NEXT: .LBB6_4:
				; X86-SLOW-NEXT: movl %esi, (%esp) # 4-byte Spill
				; X86-SLOW-NEXT: movl %ebp, %esi
				; X86-SLOW-NEXT: movl %edx, %ebp
				; X86-SLOW-NEXT: movl %ecx, %edx
				; X86-SLOW-NEXT: jmp .LBB6_6
	; X86-SLOW-NEXT: .LBB6_1:			; X86-SLOW-NEXT: .LBB6_1:
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebx			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ebx
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-SLOW-NEXT: .LBB6_3:
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: testb $32, %al			; X86-SLOW-NEXT: testb $32, %al
	; X86-SLOW-NEXT: jne .LBB6_4			; X86-SLOW-NEXT: jne .LBB6_4
	; X86-SLOW-NEXT: # %bb.5:			; X86-SLOW-NEXT: .LBB6_5:
	; X86-SLOW-NEXT: movl %ecx, %ebx			; X86-SLOW-NEXT: movl %ecx, %ebx
	; X86-SLOW-NEXT: movl %edi, (%esp) # 4-byte Spill			; X86-SLOW-NEXT: movl %edi, (%esp) # 4-byte Spill
	; X86-SLOW-NEXT: jmp .LBB6_6
	; X86-SLOW-NEXT: .LBB6_4:
	; X86-SLOW-NEXT: movl %esi, (%esp) # 4-byte Spill
	; X86-SLOW-NEXT: movl %ebp, %esi
	; X86-SLOW-NEXT: movl %edx, %ebp
	; X86-SLOW-NEXT: movl %ecx, %edx
	; X86-SLOW-NEXT: .LBB6_6:			; X86-SLOW-NEXT: .LBB6_6:
	; X86-SLOW-NEXT: movl %edx, %edi			; X86-SLOW-NEXT: movl %edx, %edi
	; X86-SLOW-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-SLOW-NEXT: movl %eax, %ecx			; X86-SLOW-NEXT: movl %eax, %ecx
	; X86-SLOW-NEXT: shll %cl, %edi			; X86-SLOW-NEXT: shll %cl, %edi
	; X86-SLOW-NEXT: shrl %ebx			; X86-SLOW-NEXT: shrl %ebx
	; X86-SLOW-NEXT: movb %al, %ch			; X86-SLOW-NEXT: movb %al, %ch
	; X86-SLOW-NEXT: notb %ch			; X86-SLOW-NEXT: notb %ch
	; X86-SLOW-NEXT: movb %ch, %cl			; X86-SLOW-NEXT: movb %ch, %cl
	; X86-SLOW-NEXT: shrl %cl, %ebx			; X86-SLOW-NEXT: shrl %cl, %ebx
	; X86-SLOW-NEXT: orl %edi, %ebx			; X86-SLOW-NEXT: orl %edi, %ebx
	▲ Show 20 Lines • Show All 309 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/masked_load.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines
	; SSE-NEXT: jne LBB3_5			; SSE-NEXT: jne LBB3_5
	; SSE-NEXT: LBB3_6: ## %else5			; SSE-NEXT: LBB3_6: ## %else5
	; SSE-NEXT: testb $8, %al			; SSE-NEXT: testb $8, %al
	; SSE-NEXT: jne LBB3_7			; SSE-NEXT: jne LBB3_7
	; SSE-NEXT: LBB3_8: ## %else8			; SSE-NEXT: LBB3_8: ## %else8
	; SSE-NEXT: retq			; SSE-NEXT: retq
	; SSE-NEXT: LBB3_1: ## %cond.load			; SSE-NEXT: LBB3_1: ## %cond.load
	; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; SSE-NEXT: xorps %xmm1, %xmm1
	; SSE-NEXT: testb $2, %al			; SSE-NEXT: testb $2, %al
	; SSE-NEXT: je LBB3_4			; SSE-NEXT: je LBB3_4
	; SSE-NEXT: LBB3_3: ## %cond.load1			; SSE-NEXT: LBB3_3: ## %cond.load1
	; SSE-NEXT: movhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]			; SSE-NEXT: movhps {{.*#+}} xmm0 = xmm0[0,1],mem[0,1]
	; SSE-NEXT: testb $4, %al			; SSE-NEXT: testb $4, %al
	; SSE-NEXT: je LBB3_6			; SSE-NEXT: je LBB3_6
	; SSE-NEXT: LBB3_5: ## %cond.load4			; SSE-NEXT: LBB3_5: ## %cond.load4
	; SSE-NEXT: movlps {{.*#+}} xmm1 = mem[0,1],xmm1[2,3]			; SSE-NEXT: movlps {{.*#+}} xmm1 = mem[0,1],xmm1[2,3]
	▲ Show 20 Lines • Show All 865 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: jne LBB10_13			; SSE2-NEXT: jne LBB10_13
	; SSE2-NEXT: LBB10_14: ## %else17			; SSE2-NEXT: LBB10_14: ## %else17
	; SSE2-NEXT: testb $-128, %al			; SSE2-NEXT: testb $-128, %al
	; SSE2-NEXT: jne LBB10_15			; SSE2-NEXT: jne LBB10_15
	; SSE2-NEXT: LBB10_16: ## %else20			; SSE2-NEXT: LBB10_16: ## %else20
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	; SSE2-NEXT: LBB10_1: ## %cond.load			; SSE2-NEXT: LBB10_1: ## %cond.load
	; SSE2-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; SSE2-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; SSE2-NEXT: xorps %xmm1, %xmm1
	; SSE2-NEXT: testb $2, %al			; SSE2-NEXT: testb $2, %al
	; SSE2-NEXT: je LBB10_4			; SSE2-NEXT: je LBB10_4
	; SSE2-NEXT: LBB10_3: ## %cond.load1			; SSE2-NEXT: LBB10_3: ## %cond.load1
	; SSE2-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero			; SSE2-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero
	; SSE2-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]			; SSE2-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]
	; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[2,0],xmm0[2,3]			; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[2,0],xmm0[2,3]
	; SSE2-NEXT: movaps %xmm2, %xmm0			; SSE2-NEXT: movaps %xmm2, %xmm0
	; SSE2-NEXT: testb $4, %al			; SSE2-NEXT: testb $4, %al
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; SSE42-NEXT: jne LBB10_13			; SSE42-NEXT: jne LBB10_13
	; SSE42-NEXT: LBB10_14: ## %else17			; SSE42-NEXT: LBB10_14: ## %else17
	; SSE42-NEXT: testb $-128, %al			; SSE42-NEXT: testb $-128, %al
	; SSE42-NEXT: jne LBB10_15			; SSE42-NEXT: jne LBB10_15
	; SSE42-NEXT: LBB10_16: ## %else20			; SSE42-NEXT: LBB10_16: ## %else20
	; SSE42-NEXT: retq			; SSE42-NEXT: retq
	; SSE42-NEXT: LBB10_1: ## %cond.load			; SSE42-NEXT: LBB10_1: ## %cond.load
	; SSE42-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; SSE42-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; SSE42-NEXT: xorps %xmm1, %xmm1
	; SSE42-NEXT: testb $2, %al			; SSE42-NEXT: testb $2, %al
	; SSE42-NEXT: je LBB10_4			; SSE42-NEXT: je LBB10_4
	; SSE42-NEXT: LBB10_3: ## %cond.load1			; SSE42-NEXT: LBB10_3: ## %cond.load1
	; SSE42-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]			; SSE42-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[2,3]
	; SSE42-NEXT: testb $4, %al			; SSE42-NEXT: testb $4, %al
	; SSE42-NEXT: je LBB10_6			; SSE42-NEXT: je LBB10_6
	; SSE42-NEXT: LBB10_5: ## %cond.load4			; SSE42-NEXT: LBB10_5: ## %cond.load4
	; SSE42-NEXT: insertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]			; SSE42-NEXT: insertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
	▲ Show 20 Lines • Show All 1,424 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: jne LBB20_13			; SSE2-NEXT: jne LBB20_13
	; SSE2-NEXT: LBB20_14: ## %else17			; SSE2-NEXT: LBB20_14: ## %else17
	; SSE2-NEXT: testb $-128, %al			; SSE2-NEXT: testb $-128, %al
	; SSE2-NEXT: jne LBB20_15			; SSE2-NEXT: jne LBB20_15
	; SSE2-NEXT: LBB20_16: ## %else20			; SSE2-NEXT: LBB20_16: ## %else20
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	; SSE2-NEXT: LBB20_1: ## %cond.load			; SSE2-NEXT: LBB20_1: ## %cond.load
	; SSE2-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; SSE2-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; SSE2-NEXT: xorps %xmm1, %xmm1
	; SSE2-NEXT: testb $2, %al			; SSE2-NEXT: testb $2, %al
	; SSE2-NEXT: je LBB20_4			; SSE2-NEXT: je LBB20_4
	; SSE2-NEXT: LBB20_3: ## %cond.load1			; SSE2-NEXT: LBB20_3: ## %cond.load1
	; SSE2-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero			; SSE2-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero
	; SSE2-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]			; SSE2-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]
	; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[2,0],xmm0[2,3]			; SSE2-NEXT: shufps {{.*#+}} xmm2 = xmm2[2,0],xmm0[2,3]
	; SSE2-NEXT: movaps %xmm2, %xmm0			; SSE2-NEXT: movaps %xmm2, %xmm0
	; SSE2-NEXT: testb $4, %al			; SSE2-NEXT: testb $4, %al
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; SSE42-NEXT: jne LBB20_13			; SSE42-NEXT: jne LBB20_13
	; SSE42-NEXT: LBB20_14: ## %else17			; SSE42-NEXT: LBB20_14: ## %else17
	; SSE42-NEXT: testb $-128, %al			; SSE42-NEXT: testb $-128, %al
	; SSE42-NEXT: jne LBB20_15			; SSE42-NEXT: jne LBB20_15
	; SSE42-NEXT: LBB20_16: ## %else20			; SSE42-NEXT: LBB20_16: ## %else20
	; SSE42-NEXT: retq			; SSE42-NEXT: retq
	; SSE42-NEXT: LBB20_1: ## %cond.load			; SSE42-NEXT: LBB20_1: ## %cond.load
	; SSE42-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero			; SSE42-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; SSE42-NEXT: pxor %xmm1, %xmm1
	; SSE42-NEXT: testb $2, %al			; SSE42-NEXT: testb $2, %al
	; SSE42-NEXT: je LBB20_4			; SSE42-NEXT: je LBB20_4
	; SSE42-NEXT: LBB20_3: ## %cond.load1			; SSE42-NEXT: LBB20_3: ## %cond.load1
	; SSE42-NEXT: pinsrd $1, 4(%rdi), %xmm0			; SSE42-NEXT: pinsrd $1, 4(%rdi), %xmm0
	; SSE42-NEXT: testb $4, %al			; SSE42-NEXT: testb $4, %al
	; SSE42-NEXT: je LBB20_6			; SSE42-NEXT: je LBB20_6
	; SSE42-NEXT: LBB20_5: ## %cond.load4			; SSE42-NEXT: LBB20_5: ## %cond.load4
	; SSE42-NEXT: pinsrd $2, 8(%rdi), %xmm0			; SSE42-NEXT: pinsrd $2, 8(%rdi), %xmm0
	▲ Show 20 Lines • Show All 4,612 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/oddshuffles.ll

	Show First 20 Lines • Show All 2,225 Lines • ▼ Show 20 Lines
	; SSE42-LABEL: splat_v3i32:			; SSE42-LABEL: splat_v3i32:
	; SSE42: # %bb.0:			; SSE42: # %bb.0:
	; SSE42-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; SSE42-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; SSE42-NEXT: pxor %xmm1, %xmm1			; SSE42-NEXT: pxor %xmm1, %xmm1
	; SSE42-NEXT: pxor %xmm2, %xmm2			; SSE42-NEXT: pxor %xmm2, %xmm2
	; SSE42-NEXT: pblendw {{.*#+}} xmm2 = xmm0[0,1],xmm2[2,3,4,5,6,7]			; SSE42-NEXT: pblendw {{.*#+}} xmm2 = xmm0[0,1],xmm2[2,3,4,5,6,7]
	; SSE42-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3],xmm1[4,5,6,7]			; SSE42-NEXT: pblendw {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3],xmm1[4,5,6,7]
	; SSE42-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,0,1]			; SSE42-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,0,1]
	; SSE42-NEXT: pxor %xmm1, %xmm1
	; SSE42-NEXT: xorps %xmm3, %xmm3			; SSE42-NEXT: xorps %xmm3, %xmm3
	; SSE42-NEXT: retq			; SSE42-NEXT: retq
	;			;
	; AVX1-LABEL: splat_v3i32:			; AVX1-LABEL: splat_v3i32:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vmovddup {{.*#+}} xmm1 = mem[0,0]			; AVX1-NEXT: vmovddup {{.*#+}} xmm1 = mem[0,0]
	; AVX1-NEXT: vxorps %xmm2, %xmm2, %xmm2			; AVX1-NEXT: vxorps %xmm2, %xmm2, %xmm2
	; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm2[0],ymm1[1],ymm2[2,3,4,5,6,7]			; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm2[0],ymm1[1],ymm2[2,3,4,5,6,7]
	▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/opt-pipeline.ll

	Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Fixup Statepoint Caller Saved			; CHECK-NEXT: Fixup Statepoint Caller Saved
	; CHECK-NEXT: PostRA Machine Sink			; CHECK-NEXT: PostRA Machine Sink
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Machine Optimization Remark Emitter			; CHECK-NEXT: Machine Optimization Remark Emitter
	; CHECK-NEXT: Shrink Wrapping analysis			; CHECK-NEXT: Shrink Wrapping analysis
	; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; CHECK-NEXT: Prologue/Epilogue Insertion & Frame Finalization
				; CHECK-NEXT: Machine Late Instructions Cleanup Pass
	; CHECK-NEXT: Control Flow Optimizer			; CHECK-NEXT: Control Flow Optimizer
	; CHECK-NEXT: Lazy Machine Block Frequency Analysis			; CHECK-NEXT: Lazy Machine Block Frequency Analysis
	; CHECK-NEXT: Tail Duplication			; CHECK-NEXT: Tail Duplication
	; CHECK-NEXT: Machine Copy Propagation Pass			; CHECK-NEXT: Machine Copy Propagation Pass
	; CHECK-NEXT: Post-RA pseudo instruction expansion pass			; CHECK-NEXT: Post-RA pseudo instruction expansion pass
	; CHECK-NEXT: X86 pseudo instruction expansion pass			; CHECK-NEXT: X86 pseudo instruction expansion pass
	; CHECK-NEXT: Insert KCFI indirect call checks			; CHECK-NEXT: Insert KCFI indirect call checks
	; CHECK-NEXT: MachineDominator Tree Construction			; CHECK-NEXT: MachineDominator Tree Construction
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/sdiv_fix_sat.ll

	Show First 20 Lines • Show All 1,234 Lines • ▼ Show 20 Lines
	; X86-NEXT: movl $0, %esi			; X86-NEXT: movl $0, %esi
	; X86-NEXT: cmovel %esi, %edi			; X86-NEXT: cmovel %esi, %edi
	; X86-NEXT: cmpl $-1, %edi			; X86-NEXT: cmpl $-1, %edi
	; X86-NEXT: movl $0, %edx			; X86-NEXT: movl $0, %edx
	; X86-NEXT: cmovel %eax, %edx			; X86-NEXT: cmovel %eax, %edx
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
	; X86-NEXT: testl %ecx, %ecx			; X86-NEXT: testl %ecx, %ecx
	; X86-NEXT: cmovsl %esi, %eax			; X86-NEXT: cmovsl %esi, %eax
	; X86-NEXT: movl $0, %esi
	; X86-NEXT: movl $-1, %ebx			; X86-NEXT: movl $-1, %ebx
	; X86-NEXT: cmovsl %ebx, %edi			; X86-NEXT: cmovsl %ebx, %edi
	; X86-NEXT: andl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload			; X86-NEXT: andl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
	; X86-NEXT: cmpl $-1, %ecx			; X86-NEXT: cmpl $-1, %ecx
	; X86-NEXT: cmovel %edx, %eax			; X86-NEXT: cmovel %edx, %eax
	; X86-NEXT: cmovnel %edi, %ecx			; X86-NEXT: cmovnel %edi, %ecx
	; X86-NEXT: shldl $31, %eax, %ecx			; X86-NEXT: shldl $31, %eax, %ecx
	; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/shift-i128.ll

	Show First 20 Lines • Show All 527 Lines • ▼ Show 20 Lines
	; i686-NEXT: shrdl %cl, %ebp, %esi			; i686-NEXT: shrdl %cl, %ebp, %esi
	; i686-NEXT: testb $32, %dl			; i686-NEXT: testb $32, %dl
	; i686-NEXT: jne .LBB6_9			; i686-NEXT: jne .LBB6_9
	; i686-NEXT: # %bb.8: # %entry			; i686-NEXT: # %bb.8: # %entry
	; i686-NEXT: movl %esi, %ebx			; i686-NEXT: movl %esi, %ebx
	; i686-NEXT: .LBB6_9: # %entry			; i686-NEXT: .LBB6_9: # %entry
	; i686-NEXT: movl %edi, %esi			; i686-NEXT: movl %edi, %esi
	; i686-NEXT: movl %ebx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; i686-NEXT: movl %ebx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; i686-NEXT: movl {{[0-9]+}}(%esp), %ebp
	; i686-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload			; i686-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
	; i686-NEXT: shrl %cl, %ebp			; i686-NEXT: shrl %cl, %ebp
	; i686-NEXT: testb $32, %cl			; i686-NEXT: testb $32, %cl
	; i686-NEXT: movl $0, %ecx			; i686-NEXT: movl $0, %ecx
	; i686-NEXT: jne .LBB6_11			; i686-NEXT: jne .LBB6_11
	; i686-NEXT: # %bb.10: # %entry			; i686-NEXT: # %bb.10: # %entry
	; i686-NEXT: movl %ebp, %ecx			; i686-NEXT: movl %ebp, %ecx
	; i686-NEXT: .LBB6_11: # %entry			; i686-NEXT: .LBB6_11: # %entry
	▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	; i686-NEXT: # %bb.10: # %entry			; i686-NEXT: # %bb.10: # %entry
	; i686-NEXT: movl %esi, %ecx			; i686-NEXT: movl %esi, %ecx
	; i686-NEXT: .LBB7_11: # %entry			; i686-NEXT: .LBB7_11: # %entry
	; i686-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; i686-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; i686-NEXT: movl %esi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; i686-NEXT: movl %esi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; i686-NEXT: movl {{[0-9]+}}(%esp), %esi			; i686-NEXT: movl {{[0-9]+}}(%esp), %esi
	; i686-NEXT: movb $64, %cl			; i686-NEXT: movb $64, %cl
	; i686-NEXT: subb %dl, %cl			; i686-NEXT: subb %dl, %cl
	; i686-NEXT: movl {{[0-9]+}}(%esp), %ebp
	; i686-NEXT: movl {{[0-9]+}}(%esp), %ebx			; i686-NEXT: movl {{[0-9]+}}(%esp), %ebx
	; i686-NEXT: shldl %cl, %ebx, %ebp			; i686-NEXT: shldl %cl, %ebx, %ebp
	; i686-NEXT: movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; i686-NEXT: movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; i686-NEXT: movl %ebx, %ebp			; i686-NEXT: movl %ebx, %ebp
	; i686-NEXT: shll %cl, %ebp			; i686-NEXT: shll %cl, %ebp
	; i686-NEXT: testb $32, %cl			; i686-NEXT: testb $32, %cl
	; i686-NEXT: movb $64, %bl			; i686-NEXT: movb $64, %bl
	; i686-NEXT: movl %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; i686-NEXT: movl %edi, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	▲ Show 20 Lines • Show All 759 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/ushl_sat_vec.ll

	Show First 20 Lines • Show All 348 Lines • ▼ Show 20 Lines
	; X86-NEXT: movl %edx, %esi			; X86-NEXT: movl %edx, %esi
	; X86-NEXT: shll %cl, %esi			; X86-NEXT: shll %cl, %esi
	; X86-NEXT: movzwl %si, %ebx			; X86-NEXT: movzwl %si, %ebx
	; X86-NEXT: movl %ebx, %esi			; X86-NEXT: movl %ebx, %esi
	; X86-NEXT: shrl %cl, %esi			; X86-NEXT: shrl %cl, %esi
	; X86-NEXT: cmpw %si, %dx			; X86-NEXT: cmpw %si, %dx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx			; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: cmovnel %eax, %ebx			; X86-NEXT: cmovnel %eax, %ebx
	; X86-NEXT: movl $65535, %eax # imm = 0xFFFF
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx			; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl %edx, %esi			; X86-NEXT: movl %edx, %esi
	; X86-NEXT: shll %cl, %esi			; X86-NEXT: shll %cl, %esi
	; X86-NEXT: movzwl %si, %edi			; X86-NEXT: movzwl %si, %edi
	; X86-NEXT: movl %edi, %esi			; X86-NEXT: movl %edi, %esi
	; X86-NEXT: shrl %cl, %esi			; X86-NEXT: shrl %cl, %esi
	; X86-NEXT: cmpw %si, %dx			; X86-NEXT: cmpw %si, %dx
	; X86-NEXT: cmovnel %eax, %edi			; X86-NEXT: cmovnel %eax, %edi
	▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec_extract.ll

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines

	; OSS-Fuzz #15662			; OSS-Fuzz #15662
	; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15662			; https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15662
	define <4 x i32> @ossfuzz15662(ptr %in) {			define <4 x i32> @ossfuzz15662(ptr %in) {
	; X32-LABEL: ossfuzz15662:			; X32-LABEL: ossfuzz15662:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: xorps %xmm0, %xmm0			; X32-NEXT: xorps %xmm0, %xmm0
	; X32-NEXT: movaps %xmm0, (%eax)			; X32-NEXT: movaps %xmm0, (%eax)
	; X32-NEXT: xorps %xmm0, %xmm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: ossfuzz15662:			; X64-LABEL: ossfuzz15662:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: xorps %xmm0, %xmm0			; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: movaps %xmm0, (%rax)			; X64-NEXT: movaps %xmm0, (%rax)
	; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%C10 = icmp ule i1 false, false			%C10 = icmp ule i1 false, false
	%C3 = icmp ule i1 true, undef			%C3 = icmp ule i1 true, undef
	%B = sdiv i1 %C10, %C3			%B = sdiv i1 %C10, %C3
	%I = insertelement <4 x i32> zeroinitializer, i32 0, i1 %B			%I = insertelement <4 x i32> zeroinitializer, i32 0, i1 %B
	store <4 x i32> %I, ptr undef			store <4 x i32> %I, ptr undef
	ret <4 x i32> zeroinitializer			ret <4 x i32> zeroinitializer
	}			}

llvm/test/CodeGen/X86/vec_shift5.ll

	Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
	; Make sure we fold fully undef input vectors. We previously folded only when			; Make sure we fold fully undef input vectors. We previously folded only when
	; undef had a single use so use 2 undefs.			; undef had a single use so use 2 undefs.
	define <4 x i32> @test17(<4 x i32> %a0, ptr %dummy) {			define <4 x i32> @test17(<4 x i32> %a0, ptr %dummy) {
	; X86-LABEL: test17:			; X86-LABEL: test17:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: xorps %xmm0, %xmm0			; X86-NEXT: xorps %xmm0, %xmm0
	; X86-NEXT: movaps %xmm0, (%eax)			; X86-NEXT: movaps %xmm0, (%eax)
	; X86-NEXT: xorps %xmm0, %xmm0
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test17:			; X64-LABEL: test17:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: xorps %xmm0, %xmm0			; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: movaps %xmm0, (%rdi)			; X64-NEXT: movaps %xmm0, (%rdi)
	; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%a = call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> undef, i32 6)			%a = call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> undef, i32 6)
	store <4 x i32> %a, ptr %dummy			store <4 x i32> %a, ptr %dummy
	%res = call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> undef, i32 7)			%res = call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> undef, i32 7)
	ret <4 x i32> %res			ret <4 x i32> %res
	}			}

	define <4 x i32> @test18(<4 x i32> %a0, ptr %dummy) {			define <4 x i32> @test18(<4 x i32> %a0, ptr %dummy) {
	; X86-LABEL: test18:			; X86-LABEL: test18:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: xorps %xmm0, %xmm0			; X86-NEXT: xorps %xmm0, %xmm0
	; X86-NEXT: movaps %xmm0, (%eax)			; X86-NEXT: movaps %xmm0, (%eax)
	; X86-NEXT: xorps %xmm0, %xmm0
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test18:			; X64-LABEL: test18:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: xorps %xmm0, %xmm0			; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: movaps %xmm0, (%rdi)			; X64-NEXT: movaps %xmm0, (%rdi)
	; X64-NEXT: xorps %xmm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%a = call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> undef, i32 3)			%a = call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> undef, i32 3)
	store <4 x i32> %a, ptr %dummy			store <4 x i32> %a, ptr %dummy
	%res = call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> undef, i32 1)			%res = call <4 x i32> @llvm.x86.sse2.pslli.d(<4 x i32> undef, i32 1)
	ret <4 x i32> %res			ret <4 x i32> %res
	}			}

	; PR39482			; PR39482
	▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/test/CodeGen/XCore/scavenging.ll

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; scavenge r5 using SR spill slot			; scavenge r5 using SR spill slot
	; CHECK: stw r5, sp[0]			; CHECK: stw r5, sp[0]
	; CHECK: ldw r5, cp[[[INDEX0]]]			; CHECK: ldw r5, cp[[[INDEX0]]]
	; r4 & r5 used by InsertSPConstInst() to emit STW_l3r instruction.			; r4 & r5 used by InsertSPConstInst() to emit STW_l3r instruction.
	; CHECK: stw r0, r4[r5]			; CHECK: stw r0, r4[r5]
	; CHECK: ldaw r0, sp[0]			; CHECK: ldaw r0, sp[0]
	; CHECK: ldw r5, cp[[[INDEX1]]]			; CHECK: ldw r5, cp[[[INDEX1]]]
	; CHECK: stw r1, r0[r5]			; CHECK: stw r1, r0[r5]
	; CHECK: ldaw r0, sp[0]
	; CHECK: ldw r1, cp[[[INDEX2]]]			; CHECK: ldw r1, cp[[[INDEX2]]]
	; CHECK: stw r2, r0[r1]			; CHECK: stw r2, r0[r1]
	; CHECK: ldaw r0, sp[0]
	; CHECK: ldw r1, cp[[[INDEX3]]]			; CHECK: ldw r1, cp[[[INDEX3]]]
	; CHECK: stw r3, r0[r1]			; CHECK: stw r3, r0[r1]
	; CHECK: ldaw r0, sp[0]
	; CHECK: ldw r1, cp[[[INDEX4]]]			; CHECK: ldw r1, cp[[[INDEX4]]]
	; CHECK: stw r11, r0[r1]			; CHECK: stw r11, r0[r1]
	; CHECK: ldaw sp, sp[65535]			; CHECK: ldaw sp, sp[65535]
	; CHECK: ldw r4, sp[1]			; CHECK: ldw r4, sp[1]
	; CHECK: ldw r5, sp[0]			; CHECK: ldw r5, sp[0]
	; CHECK: retsp 34467			; CHECK: retsp 34467
	define void @ScavengeSlots(i32 %r0, i32 %r1, i32 %r2, i32 %r3, i32 %r4) nounwind {			define void @ScavengeSlots(i32 %r0, i32 %r1, i32 %r2, i32 %r3, i32 %r4) nounwind {
	entry:			entry:
	Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Late cleanup of redundant address/immediate definitions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 480166

llvm/include/llvm/CodeGen/CodeGenPassBuilder.h

llvm/include/llvm/CodeGen/MachinePassRegistry.def

llvm/include/llvm/CodeGen/Passes.h

llvm/include/llvm/InitializePasses.h

llvm/lib/CodeGen/CMakeLists.txt

llvm/lib/CodeGen/CodeGen.cpp

llvm/lib/CodeGen/MachineLateInstrsCleanup.cpp

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp

llvm/lib/Target/WebAssembly/WebAssemblyTargetMachine.cpp

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AArch64/stack-guard-remat-bitcast.ll

llvm/test/CodeGen/AArch64/sve-calling-convention-mixed.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll

llvm/test/CodeGen/AMDGPU/cc-update.ll

llvm/test/CodeGen/AMDGPU/exec-mask-opt-cannot-create-empty-or-backward-segment.ll

llvm/test/CodeGen/AMDGPU/flat-scratch.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/AMDGPU/multilevel-break.ll

llvm/test/CodeGen/AMDGPU/si-annotate-cf.ll

llvm/test/CodeGen/AMDGPU/si-unify-exit-multiple-unreachables.ll

llvm/test/CodeGen/AMDGPU/si-unify-exit-return-unreachable.ll

llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

llvm/test/CodeGen/ARM/O3-pipeline.ll

llvm/test/CodeGen/ARM/arm-shrink-wrapping.ll

llvm/test/CodeGen/ARM/fpclamptosat.ll

llvm/test/CodeGen/ARM/ifcvt-branch-weight-bug.ll

llvm/test/CodeGen/ARM/jump-table-islands.ll

llvm/test/CodeGen/ARM/reg_sequence.ll

llvm/test/CodeGen/BPF/objdump_cond_op_2.ll

llvm/test/CodeGen/M68k/pipeline.ll

llvm/test/CodeGen/Mips/llvm-ir/lshr.ll

llvm/test/CodeGen/Mips/llvm-ir/shl.ll

llvm/test/CodeGen/PowerPC/O3-pipeline.ll

llvm/test/CodeGen/PowerPC/cgp-select.ll

llvm/test/CodeGen/PowerPC/fast-isel-branch.ll

llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll

llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll

llvm/test/CodeGen/SystemZ/frame-28.mir

llvm/test/CodeGen/Thumb/frame-access.ll

llvm/test/CodeGen/Thumb2/mve-fpclamptosat_vec.ll

llvm/test/CodeGen/X86/2008-04-09-BranchFolding.ll

llvm/test/CodeGen/X86/2008-04-16-ReMatBug.ll

llvm/test/CodeGen/X86/AMX/amx-across-func.ll

llvm/test/CodeGen/X86/AMX/amx-spill-merge.ll

llvm/test/CodeGen/X86/fast-isel-stackcheck.ll

llvm/test/CodeGen/X86/fshl.ll

llvm/test/CodeGen/X86/masked_load.ll

llvm/test/CodeGen/X86/oddshuffles.ll

llvm/test/CodeGen/X86/opt-pipeline.ll

llvm/test/CodeGen/X86/sdiv_fix_sat.ll

llvm/test/CodeGen/X86/shift-i128.ll

llvm/test/CodeGen/X86/ushl_sat_vec.ll

llvm/test/CodeGen/X86/vec_extract.ll

llvm/test/CodeGen/X86/vec_shift5.ll

llvm/test/CodeGen/XCore/scavenging.ll

[CodeGen] Late cleanup of redundant address/immediate definitions.
ClosedPublic