This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
1/1
AMDGPUTargetMachine.cpp
29/32
SIWholeQuadMode.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
llvm.amdgcn.image.gather4.a16.dim.ll
-
llvm.amdgcn.image.gather4.dim.ll
-
llvm.amdgcn.image.gather4.o.dim.ll
-
atomic_optimizations_buffer.ll
-
atomic_optimizations_global_pointer.ll
-
atomic_optimizations_local_pointer.ll
-
atomic_optimizations_pixelshader.ll
-
atomic_optimizations_raw_buffer.ll
-
atomic_optimizations_struct_buffer.ll
-
llvm.amdgcn.image.gather4.a16.dim.ll
-
llvm.amdgcn.image.sample.a16.dim.ll
-
llvm.amdgcn.image.sample.d16.dim.ll
-
llvm.amdgcn.image.sample.dim.ll
-
llvm.amdgcn.ps.live.ll
-
llvm.amdgcn.softwqm.ll
-
wqm.ll
-
wwm-reserved.ll

Differential D88081

[AMDGPU] Move WQM Pass after MI Scheduler
ClosedPublic

Authored by critson on Sep 22 2020, 3:04 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
foad
arsenm

Commits

rG7a880ab38892: [AMDGPU] Move WQM Pass after MI Scheduler

Summary

Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling. Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation. These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

critson created this revision.Sep 22 2020, 3:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2020, 3:04 AM

Herald added subscribers: llvm-commits, wenlei, kerbowa and 7 others. · View Herald Transcript

critson requested review of this revision.Sep 22 2020, 3:04 AM

Herald added a subscriber: wdng. · View Herald TranscriptSep 22 2020, 3:04 AM

This passes VulkanCTS as much as stock LLVM does for graphics.
I still need to do some porting work so I can test performance impact.

foad added inline comments.Sep 22 2020, 3:41 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
993	Why set VerifyAfter = false? Also, a nit, I think insertPass(&MachineSchedulerID, &SIPreAllocateWWMRegsID, false) would be slightly easier to understand, once you know that insertPass(A,B) just appends B to the list of passes to be inserted after A.

Harbormaster completed remote builds in B72495: Diff 293394.Sep 22 2020, 3:51 AM

critson mentioned this in D67767: [AMDGPU] Add llvm.amdgcn.wqm.demote intrinsic and live mask tracking.Oct 1 2020, 7:54 AM

Address comments about pass insertion.
Fix bug in removal of trivial SGPR copies from WWM.

critson marked an inline comment as done.Oct 6 2020, 5:35 AM

Harbormaster completed remote builds in B74121: Diff 296422.Oct 6 2020, 6:20 AM

arsenm added inline comments.Oct 6 2020, 7:31 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
362–371	Isn't this pass required to be post-SSA if it's after the scheduler?

Fix assumptions about SCC live intervals which are not valid late in compilation.

Harbormaster completed remote builds in B74391: Diff 296867.Oct 7 2020, 11:19 PM

critson added inline comments.Oct 8 2020, 7:12 PM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
362–371	I am currently retaining the ability to run in both SSA and non-SSA modes.

nhaehnle added inline comments.Oct 12 2020, 11:19 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
315–332	I don't understand the logic here. Why the special treatment of operand 0?
349–363	I don't understand this logic. A use is a use -- why should implicitness or tiedness make a difference? This seems pretty wrong.
835–851	This looks suspicious. Can you please explain what is happening here?
979	This is incorrect, there could be other users of the value. Just keep the simpler case below.

critson added inline comments.Oct 12 2020, 9:34 PM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
315–332	Are you saying I should iterate MI->defs() instead? The code here is intended to mark all instructions defining parts of the specified register. If this is a partial register write then we need to follow the input values to mark the other instructions.
349–363	Agreed, this should go away. It was an early hack before I wrote a working markDefs.
835–851	This is analogous to the code above that modifies the SI_ELSE to make it respect the EXEC mask -- it is a very special case match and fix up based on current code generation so I do not like it either. If we make SI_ELSE always respect modifications to the EXEC mask then this can go away. Then perhaps we add a late peephole to clear up some of the unnecessary instructions when they are not required. Do you have an opinion on this?
979	There should not be other users of the value, it is a kill? I am not going to fight to keep this, but we would benefit from more late clean up of unnecessary copies. I guess this ties into some of the things I am touching on in D89187, so the follow up to that might solve this.

nhaehnle added inline comments.Oct 14 2020, 4:04 PM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
315–332	Are you saying I should iterate MI->defs() instead? Well, there's the question of whether you need to follow implicit defs as well. I just don't see why you're treating operand 0 differently from others.
835–851	We really shouldn't rely on such details of current code generation here. I think you're on to something. Digging into this more... SI_ELSE is currently lowered as: s_or_saveexec_bNN dst, src ... s_xor_bNN_term exec, exec, dst What if we instead lowered it as: s_or_saveexec_bNN tmp, src ... s_and_bNN_term dst, exec, tmp s_xor_bNN_term exec, exec, dst One of the OptimizeExecMasking passes can then just remove the s_and_bNN_term if there is no modification of exec in the middle. I think I'd be happy about that solution. In practice, the scan backwards from s_and_bNN isn't that expensive and I believe it's required anyway. Thoughts?
979	I appreciate the desire to remove some unnecessary copies, but let's first figure out whether this one is correct. Specifically, I thought LRQ.isKill() only means that the use of SrcReg in MI is the _last_ use. There could be other uses of the same definition of SrcReg that come earlier, right? So maybe you could still eliminate the copy here if you updated those other uses as well. I would still ask you to keep things simpler here for this change and see if you can find a good place to eliminate this kind of copy separately in a dedicated pass. This code is quite difficult to follow even without this.

critson marked 5 inline comments as done.Oct 16 2020, 5:05 AM

critson added inline comments.

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
315–332	I don't think I am intentionally treating operand 0 different, this was just written to inspect the definition of the instruction and should be based on defs(). The point of this loop is to follow the chain of all definitions of a register (or parts of it), and mark each instruction involved. The idea is it stops when the entire register has been defined. (Or rather it needs to keep going if the definitions are only partial.) For that reason we should also look at implicit defs, as the first whole register implicit def should be a valid stopping point.
835–851	Yes, your solution is what I was trying to suggest. Your instead case is what happens when we do MI.getOperand(3).setImm(1). I will put it in as a separate Phabricator review shortly to simplify SI_ELSE lowering and optimise out the unnecessary s_and in OptimizeExecMasking.
979	OK, yep I see that there /could/ be other users, although in practice I had not encountered them. I will work on cleaning this up in a later pass.

nhaehnle added inline comments.Oct 19 2020, 9:57 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
315–332	Well the code treats operand 0 specially, because there's literally a `getOperand(0)` in there, with a hard-coded `0`. If the intention is to treat all defs in the same way (which I think it should be), then why not have a single homogenous loop over operands? I do understand the point about partial defs, that makes sense and it's not what I'm worried about.
835–851	Sounds good!
979	What do you mean by cleaning this up in a later pass? The goal should be to keep the MachineIR an accurate representation of the program at all times.

Rebase
Fix markDefs to iterate all operands of MI
Remove fix up for SI_ELSE as this is no longer required
Remove elimination of trivial SGPR to SGPR WWM copies (this adds cruft in atomic optimizer tests)

Herald added a subscriber: jfb. · View Herald TranscriptOct 20 2020, 11:08 PM

critson marked 5 inline comments as done.Oct 20 2020, 11:15 PM

critson added inline comments.

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
315–332	I have fixed this to iterate all operands looking for appropriate defs to follow.
979	MachineIR is accurate. My point is because the pass now runs later there is nothing to optimise away trivial copies it introduces when lowering WWM operations. See cruft this adds in atomic tests, e.g. atomic_optimizations_buffer.ll

Harbormaster completed remote builds in B75817: Diff 299556.Oct 20 2020, 11:41 PM

LGTM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
315–332	Thanks!
979	Ah, I see. Maybe that cleanup could be done as a follow-up change.

This revision is now accepted and ready to land.Oct 22 2020, 8:41 AM

This revision was landed with ongoing or failed builds.Oct 26 2020, 6:26 PM

Closed by commit rG7a880ab38892: [AMDGPU] Move WQM Pass after MI Scheduler (authored by critson). · Explain Why

This revision was automatically updated to reflect the committed changes.

critson marked an inline comment as done.

critson added a commit: rG7a880ab38892: [AMDGPU] Move WQM Pass after MI Scheduler.

mceier added a subscriber: mceier.Oct 30 2020, 1:18 PM

mceier added inline comments.

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

653–657

Is this really correct ? CS:GO crashes with git version of llvm at line 657 (on Radeon 5700XT):

#0  0x00007fad33ab57b0 in llvm::IndexListEntry::getIndex (this=0x0) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/include/llvm/CodeGen/SlotIndexes.h:58
#1  0x00007fad33ab581d in llvm::SlotIndex::getIndex (this=0x7fad2ffa2960) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/include/llvm/CodeGen/SlotIndexes.h:125
#2  0x00007fad33ab591d in llvm::SlotIndex::operator> (this=0x7fad2ffa2960, other=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/include/llvm/CodeGen/SlotIndexes.h:187
#3  0x00007fad362e842c in (anonymous namespace)::SIWholeQuadMode::prepareInsertion (this=0x1460000, MBB=..., First=..., Last=..., PreferLast=false, SaveSCC=true)
    at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:657
#4  0x00007fad362e923c in (anonymous namespace)::SIWholeQuadMode::processBlock (this=0x1460000, MBB=..., LiveMaskReg=2147484277, isEntry=true)
    at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:859
#5  0x00007fad362ea140 in (anonymous namespace)::SIWholeQuadMode::runOnMachineFunction (this=0x1460000, MF=...)
    at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:1055
#6  0x00007fad33d15048 in llvm::MachineFunctionPass::runOnFunction (this=0x1460000, F=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#7  0x00007fad3391406a in llvm::FPPassManager::runOnFunction (this=0x148ba80, F=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/IR/LegacyPassManager.cpp:1519
#8  0x00007fad352ab96b in (anonymous namespace)::CGPassManager::RunPassOnSCC (this=0x148bc40, P=0x148ba80, CurSCC=..., CG=..., CallGraphUpToDate=@0x7fad2ffa2d8d: true, DevirtualizedCall=@0x7fad2ffa2e30: false)
    at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Analysis/CallGraphSCCPass.cpp:178
#9  0x00007fad352ac4ed in (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (this=0x148bc40, CurSCC=..., CG=..., DevirtualizedCall=@0x7fad2ffa2e30: false)
    at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Analysis/CallGraphSCCPass.cpp:476
#10 0x00007fad352ac7e4 in (anonymous namespace)::CGPassManager::runOnModule (this=0x148bc40, M=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Analysis/CallGraphSCCPass.cpp:541
#11 0x00007fad339147f6 in (anonymous namespace)::MPPassManager::runOnModule (this=0x1412800, M=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/IR/LegacyPassManager.cpp:1634
#12 0x00007fad3390f6ce in llvm::legacy::PassManagerImpl::run (this=0x1430200, M=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/IR/LegacyPassManager.cpp:615
#13 0x00007fad33915087 in llvm::legacy::PassManager::run (this=0x1427bb8, M=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/IR/LegacyPassManager.cpp:1761
#14 0x00007fad3abc7145 in ac_compile_module_to_elf (p=0x1427b60, module=0x35af9300, pelf_buffer=0x111a5ad0, pelf_size=0x111a5ad8) at ../mesa-9999/src/amd/llvm/ac_llvm_helper.cpp:259
#15 0x00007fad3aad8501 in si_compile_llvm (sscreen=0xc23400, binary=0x111a5ad0, conf=0x111a5ae8, compiler=0xc23cb0, ac=0x7fad2ffa3490, debug=0x2577a030, stage=MESA_SHADER_FRAGMENT,
    name=0x7fad3ad7582a "Pixel Shader", less_optimized=false) at ../mesa-9999/src/gallium/drivers/radeonsi/si_shader_llvm.c:104
#16 0x00007fad3aad61d2 in si_llvm_compile_shader (sscreen=0xc23400, compiler=0xc23cb0, shader=0x111a5a00, debug=0x2577a030, nir=0xc5a6450, free_nir=false)
    at ../mesa-9999/src/gallium/drivers/radeonsi/si_shader.c:1891
#17 0x00007fad3aad634d in si_compile_shader (sscreen=0xc23400, compiler=0xc23cb0, shader=0x111a5a00, debug=0x2577a030) at ../mesa-9999/src/gallium/drivers/radeonsi/si_shader.c:1927
#18 0x00007fad3ab109c3 in si_init_shader_selector_async (job=0x2577a000, thread_index=0) at ../mesa-9999/src/gallium/drivers/radeonsi/si_state_shaders.c:2492
#19 0x00007fad3a73196b in util_queue_thread_func (input=0xc34ed0) at ../mesa-9999/src/util/u_queue.c:304
#20 0x00007fad3a730acb in impl_thrd_routine (p=0xc34ec0) at ../mesa-9999/include/c11/threads_posix.h:87
#21 0x00007fad3fbcbf9e in start_thread (arg=0x7fad2ffa5640) at pthread_create.c:463
#22 0x00007fad3ff4865f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

critson added inline comments.Nov 5 2020, 4:10 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
653–657	Thanks for bringing this to my attention. Sorry for the slow response. I am currently investigating. Do you happen to have any more details for reproduction?

mceier added inline comments.Nov 5 2020, 6:02 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
653–657	The crash happens at CS:GO startup and unfortunately besides this and the stacktrace I don't have more details about it; I don't know how to extract the shader that triggers this crash (not even sure if it will help with anything). So far only CS:GO triggers it. I wonder if the move of "insertPass" from line 1000 to 993 might be the culprit ? If so I could rebuild the llvm (I didn't do it yet because it takes a lot of time) and verify this.

mceier added inline comments.Nov 5 2020, 11:49 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
653–657	I tried changing the order of passes and CS:GO still crashed. Also I was able to dump LLVM IR of the shader (by setting AMD_DEBUG=ps,vs): shader4.txt.gz6 KBDownload It's always shader 387 that crashes. Hopefully it will help you debug this.

mceier added inline comments.Nov 5 2020, 11:53 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
653–657	After commenting out top 2 lines, llc produces the same stacktrace as CS:GO.

critson added inline comments.Nov 6 2020, 5:14 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
653–657	Thanks for this shader. Can you share you llc command line options as I was unable to get a crash feeding the shader to: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs

mceier added inline comments.Nov 6 2020, 5:25 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

653–657

Well, I just renamed shader4.txt to shader4.ll, commented out 2 top lines and ran llc shader4.ll and llc crashed. I didn't use any llc options.

"llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs shader4.ll" also doesn't crash here.

LLVM commit is f738aee0bbf39d11b9f0104e094c7893ffca040c

llc --version shows:

LLVM (http://llvm.org/):

LLVM version 12.0.0git
Optimized build.
Default target: x86_64-pc-linux-gnu
Host CPU: skylake

Registered Targets:
  aarch64    - AArch64 (little endian)
  aarch64_32 - AArch64 (little endian ILP32)
  aarch64_be - AArch64 (big endian)
  amdgcn     - AMD GCN GPUs
  arm        - ARM
  arm64      - ARM64 (little endian)
  arm64_32   - ARM64 (little endian ILP32)
  armeb      - ARM (big endian)
  bpf        - BPF (host endian)
  bpfeb      - BPF (big endian)
  bpfel      - BPF (little endian)
  nvptx      - NVIDIA PTX 32-bit
  nvptx64    - NVIDIA PTX 64-bit
  r600       - AMD GPUs HD2XXX-HD6XXX
  riscv32    - 32-bit RISC-V
  riscv64    - 64-bit RISC-V
  thumb      - Thumb
  thumbeb    - Thumb (big endian)
  wasm32     - WebAssembly 32-bit
  wasm64     - WebAssembly 64-bit
  x86        - 32-bit X86: Pentium-Pro and above
  x86-64     - 64-bit X86: EM64T and AMD64

mceier added inline comments.Nov 6 2020, 6:07 AM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
653–657	llc crashes only for -mcpu=generic and -mcpu=generic-hsa btw. shader4.txt.gz on phabricator seems to be gzipped twice (I uploaded gzipped file and I didn't know phabricator will gzip it again) and has to be ungzipped twice.

critson added inline comments.Nov 6 2020, 11:19 PM

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp
653–657	Thanks. I have created D90997 which hopefully should address the issue.

mceier mentioned this in D90997: [AMDGPU] SIWholeQuadMode fix mode insertion when SCC always defined.Nov 7 2020, 1:05 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

10 lines

SIWholeQuadMode.cpp

160 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.amdgcn.image.gather4.a16.dim.ll

292 lines

llvm.amdgcn.image.gather4.dim.ll

78 lines

llvm.amdgcn.image.gather4.o.dim.ll

42 lines

atomic_optimizations_buffer.ll

16 lines

atomic_optimizations_global_pointer.ll

16 lines

atomic_optimizations_local_pointer.ll

1631 lines

atomic_optimizations_pixelshader.ll

137 lines

atomic_optimizations_raw_buffer.ll

16 lines

atomic_optimizations_struct_buffer.ll

16 lines

llvm.amdgcn.image.gather4.a16.dim.ll

20 lines

llvm.amdgcn.image.sample.a16.dim.ll

40 lines

llvm.amdgcn.image.sample.d16.dim.ll

36 lines

llvm.amdgcn.image.sample.dim.ll

124 lines

llvm.amdgcn.ps.live.ll

4 lines

llvm.amdgcn.softwqm.ll

10 lines

wqm.ll

11 lines

wwm-reserved.ll

15 lines

Diff 300853

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 962 Lines • ▼ Show 20 Lines	bool GCNPassConfig::addGlobalInstructionSelect() {
addPass(new InstructionSelect());		addPass(new InstructionSelect());
return false;		return false;
}		}

void GCNPassConfig::addPreRegAlloc() {		void GCNPassConfig::addPreRegAlloc() {
if (LateCFGStructurize) {		if (LateCFGStructurize) {
addPass(createAMDGPUMachineCFGStructurizerPass());		addPass(createAMDGPUMachineCFGStructurizerPass());
}		}
addPass(createSIWholeQuadModePass());
}		}

void GCNPassConfig::addFastRegAlloc() {		void GCNPassConfig::addFastRegAlloc() {
		addPass(createSIWholeQuadModePass());
// FIXME: We have to disable the verifier here because of PHIElimination +		// FIXME: We have to disable the verifier here because of PHIElimination +
// TwoAddressInstructions disabling it.		// TwoAddressInstructions disabling it.

// This must be run immediately after phi elimination and before		// This must be run immediately after phi elimination and before
// TwoAddressInstructions, otherwise the processing of the tied operand of		// TwoAddressInstructions, otherwise the processing of the tied operand of
// SI_ELSE will introduce a copy of the tied operand source after the else.		// SI_ELSE will introduce a copy of the tied operand source after the else.
insertPass(&PHIEliminationID, &SILowerControlFlowID, false);		insertPass(&PHIEliminationID, &SILowerControlFlowID, false);

// This must be run just after RegisterCoalescing.		// This must be run just after RegisterCoalescing.
insertPass(&RegisterCoalescerID, &SIPreAllocateWWMRegsID, false);		insertPass(&RegisterCoalescerID, &SIPreAllocateWWMRegsID, false);

TargetPassConfig::addFastRegAlloc();		TargetPassConfig::addFastRegAlloc();
}		}

void GCNPassConfig::addOptimizedRegAlloc() {		void GCNPassConfig::addOptimizedRegAlloc() {
		// Allow the scheduler to run before SIWholeQuadMode inserts exec manipulation
		// instructions that cause scheduling barriers.
		insertPass(&MachineSchedulerID, &SIWholeQuadModeID);
		insertPass(&MachineSchedulerID, &SIPreAllocateWWMRegsID);
		foadUnsubmitted Done Reply Inline Actions Why set VerifyAfter = false? Also, a nit, I think insertPass(&MachineSchedulerID, &SIPreAllocateWWMRegsID, false) would be slightly easier to understand, once you know that insertPass(A,B) just appends B to the list of passes to be inserted after A. foad: Why set VerifyAfter = false? Also, a nit, I think insertPass(&MachineSchedulerID…

if (OptExecMaskPreRA)		if (OptExecMaskPreRA)
insertPass(&MachineSchedulerID, &SIOptimizeExecMaskingPreRAID);		insertPass(&MachineSchedulerID, &SIOptimizeExecMaskingPreRAID);
insertPass(&MachineSchedulerID, &SIFormMemoryClausesID);		insertPass(&MachineSchedulerID, &SIFormMemoryClausesID);

// This must be run immediately after phi elimination and before		// This must be run immediately after phi elimination and before
// TwoAddressInstructions, otherwise the processing of the tied operand of		// TwoAddressInstructions, otherwise the processing of the tied operand of
// SI_ELSE will introduce a copy of the tied operand source after the else.		// SI_ELSE will introduce a copy of the tied operand source after the else.
insertPass(&PHIEliminationID, &SILowerControlFlowID, false);		insertPass(&PHIEliminationID, &SILowerControlFlowID, false);

// This must be run just after RegisterCoalescing.
insertPass(&RegisterCoalescerID, &SIPreAllocateWWMRegsID, false);

if (EnableDCEInRA)		if (EnableDCEInRA)
insertPass(&DetectDeadLanesID, &DeadMachineInstructionElimID);		insertPass(&DetectDeadLanesID, &DeadMachineInstructionElimID);

TargetPassConfig::addOptimizedRegAlloc();		TargetPassConfig::addOptimizedRegAlloc();
}		}

bool GCNPassConfig::addPreRewrite() {		bool GCNPassConfig::addPreRewrite() {
if (EnableRegReassign) {		if (EnableRegReassign) {
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
private:		private:
CallingConv::ID CallingConv;		CallingConv::ID CallingConv;
const SIInstrInfo *TII;		const SIInstrInfo *TII;
const SIRegisterInfo *TRI;		const SIRegisterInfo *TRI;
const GCNSubtarget *ST;		const GCNSubtarget *ST;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
LiveIntervals *LIS;		LiveIntervals *LIS;

		unsigned AndOpc;
		unsigned XorTermrOpc;
		unsigned OrSaveExecOpc;
		unsigned Exec;

DenseMap<const MachineInstr *, InstrInfo> Instructions;		DenseMap<const MachineInstr *, InstrInfo> Instructions;
MapVector<MachineBasicBlock *, BlockInfo> Blocks;		MapVector<MachineBasicBlock *, BlockInfo> Blocks;
SmallVector<MachineInstr *, 1> LiveMaskQueries;		SmallVector<MachineInstr *, 1> LiveMaskQueries;
SmallVector<MachineInstr *, 4> LowerToMovInstrs;		SmallVector<MachineInstr *, 4> LowerToMovInstrs;
SmallVector<MachineInstr *, 4> LowerToCopyInstrs;		SmallVector<MachineInstr *, 4> LowerToCopyInstrs;

void printInfo();		void printInfo();

void markInstruction(MachineInstr &MI, char Flag,		void markInstruction(MachineInstr &MI, char Flag,
std::vector<WorkItem> &Worklist);		std::vector<WorkItem> &Worklist);
		void markDefs(const MachineInstr &UseMI, LiveRange &LR, Register Reg,
		unsigned SubReg, char Flag, std::vector<WorkItem> &Worklist);
void markInstructionUses(const MachineInstr &MI, char Flag,		void markInstructionUses(const MachineInstr &MI, char Flag,
std::vector<WorkItem> &Worklist);		std::vector<WorkItem> &Worklist);
char scanInstructions(MachineFunction &MF, std::vector<WorkItem> &Worklist);		char scanInstructions(MachineFunction &MF, std::vector<WorkItem> &Worklist);
void propagateInstruction(MachineInstr &MI, std::vector<WorkItem> &Worklist);		void propagateInstruction(MachineInstr &MI, std::vector<WorkItem> &Worklist);
void propagateBlock(MachineBasicBlock &MBB, std::vector<WorkItem> &Worklist);		void propagateBlock(MachineBasicBlock &MBB, std::vector<WorkItem> &Worklist);
char analyzeFunction(MachineFunction &MF);		char analyzeFunction(MachineFunction &MF);

MachineBasicBlock::iterator saveSCC(MachineBasicBlock &MBB,		MachineBasicBlock::iterator saveSCC(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
#endif		#endif

void SIWholeQuadMode::markInstruction(MachineInstr &MI, char Flag,		void SIWholeQuadMode::markInstruction(MachineInstr &MI, char Flag,
std::vector<WorkItem> &Worklist) {		std::vector<WorkItem> &Worklist) {
InstrInfo &II = Instructions[&MI];		InstrInfo &II = Instructions[&MI];

assert(!(Flag & StateExact) && Flag != 0);		assert(!(Flag & StateExact) && Flag != 0);

		LLVM_DEBUG(dbgs() << "markInstruction " << PrintState(Flag) << ": " << MI);

// Remove any disabled states from the flag. The user that required it gets		// Remove any disabled states from the flag. The user that required it gets
// an undefined value in the helper lanes. For example, this can happen if		// an undefined value in the helper lanes. For example, this can happen if
// the result of an atomic is used by instruction that requires WQM, where		// the result of an atomic is used by instruction that requires WQM, where
// ignoring the request for WQM is correct as per the relevant specs.		// ignoring the request for WQM is correct as per the relevant specs.
Flag &= ~II.Disabled;		Flag &= ~II.Disabled;

// Ignore if the flag is already encompassed by the existing needs, or we		// Ignore if the flag is already encompassed by the existing needs, or we
// just disabled everything.		// just disabled everything.
if ((II.Needs & Flag) == Flag)		if ((II.Needs & Flag) == Flag)
return;		return;

II.Needs \|= Flag;		II.Needs \|= Flag;
Worklist.push_back(&MI);		Worklist.push_back(&MI);
}		}

		/// Mark all relevant definitions of register \p Reg in usage \p UseMI.
		void SIWholeQuadMode::markDefs(const MachineInstr &UseMI, LiveRange &LR,
		Register Reg, unsigned SubReg, char Flag,
		std::vector<WorkItem> &Worklist) {
		assert(!MRI->isSSA());

		LLVM_DEBUG(dbgs() << "markDefs " << PrintState(Flag) << ": " << UseMI);

		LiveQueryResult UseLRQ = LR.Query(LIS->getInstructionIndex(UseMI));
		if (!UseLRQ.valueIn())
		return;

		SmallPtrSet<const VNInfo *, 4> Visited;
		SmallVector<const VNInfo *, 4> ToProcess;
		ToProcess.push_back(UseLRQ.valueIn());
		do {
		const VNInfo *Value = ToProcess.pop_back_val();
		Visited.insert(Value);

		if (Value->isPHIDef()) {
		// Need to mark all defs used in the PHI node
		const MachineBasicBlock *MBB = LIS->getMBBFromIndex(Value->def);
		assert(MBB && "Phi-def has no defining MBB");
		for (MachineBasicBlock::const_pred_iterator PI = MBB->pred_begin(),
		PE = MBB->pred_end();
		PI != PE; ++PI) {
		if (const VNInfo VN = LR.getVNInfoBefore(LIS->getMBBEndIdx(PI))) {
		if (!Visited.count(VN))
		ToProcess.push_back(VN);
		}
		}
		} else {
		MachineInstr *MI = LIS->getInstructionFromIndex(Value->def);
		assert(MI && "Def has no defining instruction");
		markInstruction(*MI, Flag, Worklist);

		// Iterate over all operands to find relevant definitions
		for (const MachineOperand &Op : MI->operands()) {
		if (!(Op.isReg() && Op.getReg() == Reg))
		continue;

		// Does this def cover whole register?
		bool DefinesFullReg =
		Op.isUndef() \|\| !Op.getSubReg() \|\| Op.getSubReg() == SubReg;
		if (!DefinesFullReg) {
		// Partial definition; need to follow and mark input value
		LiveQueryResult LRQ = LR.Query(LIS->getInstructionIndex(*MI));
		if (const VNInfo *VN = LRQ.valueIn()) {
		if (!Visited.count(VN))
		ToProcess.push_back(VN);
		}
		}
		}
		}
		nhaehnleUnsubmitted Done Reply Inline Actions I don't understand the logic here. Why the special treatment of operand 0? nhaehnle: I don't understand the logic here. Why the special treatment of operand 0?
		critsonAuthorUnsubmitted Done Reply Inline Actions Are you saying I should iterate MI->defs() instead? The code here is intended to mark all instructions defining parts of the specified register. If this is a partial register write then we need to follow the input values to mark the other instructions. critson: Are you saying I should iterate MI->defs() instead? The code here is intended to mark all…
		nhaehnleUnsubmitted Done Reply Inline Actions Are you saying I should iterate MI->defs() instead? Well, there's the question of whether you need to follow implicit defs as well. I just don't see why you're treating operand 0 differently from others. nhaehnle: > Are you saying I should iterate MI->defs() instead? Well, there's the question of whether…
		critsonAuthorUnsubmitted Done Reply Inline Actions I don't think I am intentionally treating operand 0 different, this was just written to inspect the definition of the instruction and should be based on defs(). The point of this loop is to follow the chain of all definitions of a register (or parts of it), and mark each instruction involved. The idea is it stops when the entire register has been defined. (Or rather it needs to keep going if the definitions are only partial.) For that reason we should also look at implicit defs, as the first whole register implicit def should be a valid stopping point. critson: I don't think I am intentionally treating operand 0 different, this was just written to inspect…
		nhaehnleUnsubmitted Done Reply Inline Actions Well the code treats operand 0 specially, because there's literally a `getOperand(0)` in there, with a hard-coded `0`. If the intention is to treat all defs in the same way (which I think it should be), then why not have a single homogenous loop over operands? I do understand the point about partial defs, that makes sense and it's not what I'm worried about. nhaehnle: Well the code treats operand 0 specially, because there's literally a `getOperand(0)` in…
		critsonAuthorUnsubmitted Done Reply Inline Actions I have fixed this to iterate all operands looking for appropriate defs to follow. critson: I have fixed this to iterate all operands looking for appropriate defs to follow.
		nhaehnleUnsubmitted Not Done Reply Inline Actions Thanks! nhaehnle: Thanks!
		} while (!ToProcess.empty());
		}

/// Mark all instructions defining the uses in \p MI with \p Flag.		/// Mark all instructions defining the uses in \p MI with \p Flag.
void SIWholeQuadMode::markInstructionUses(const MachineInstr &MI, char Flag,		void SIWholeQuadMode::markInstructionUses(const MachineInstr &MI, char Flag,
std::vector<WorkItem> &Worklist) {		std::vector<WorkItem> &Worklist) {

		LLVM_DEBUG(dbgs() << "markInstructionUses " << PrintState(Flag) << ": "
		<< MI);

for (const MachineOperand &Use : MI.uses()) {		for (const MachineOperand &Use : MI.uses()) {
if (!Use.isReg() \|\| !Use.isUse())		if (!Use.isReg() \|\| !Use.isUse())
continue;		continue;

Register Reg = Use.getReg();		Register Reg = Use.getReg();

// Handle physical registers that we need to track; this is mostly relevant		// Handle physical registers that we need to track; this is mostly relevant
// for VCC, which can appear as the (implicit) input of a uniform branch,		// for VCC, which can appear as the (implicit) input of a uniform branch,
// e.g. when a loop counter is stored in a VGPR.		// e.g. when a loop counter is stored in a VGPR.
if (!Reg.isVirtual()) {		if (!Reg.isVirtual()) {
if (Reg == AMDGPU::EXEC \|\| Reg == AMDGPU::EXEC_LO)		if (Reg == AMDGPU::EXEC \|\| Reg == AMDGPU::EXEC_LO)
continue;		continue;

for (MCRegUnitIterator RegUnit(Reg, TRI); RegUnit.isValid(); ++RegUnit) {		for (MCRegUnitIterator RegUnit(Reg, TRI); RegUnit.isValid(); ++RegUnit) {
LiveRange &LR = LIS->getRegUnit(*RegUnit);		LiveRange &LR = LIS->getRegUnit(*RegUnit);
const VNInfo *Value = LR.Query(LIS->getInstructionIndex(MI)).valueIn();		const VNInfo *Value = LR.Query(LIS->getInstructionIndex(MI)).valueIn();
if (!Value)		if (!Value)
continue;		continue;

		if (MRI->isSSA()) {
// Since we're in machine SSA, we do not need to track physical		// Since we're in machine SSA, we do not need to track physical
		nhaehnleUnsubmitted Done Reply Inline Actions I don't understand this logic. A use is a use -- why should implicitness or tiedness make a difference? This seems pretty wrong. nhaehnle: I don't understand this logic. A use is a use -- why should implicitness or tiedness make a…
		critsonAuthorUnsubmitted Done Reply Inline Actions Agreed, this should go away. It was an early hack before I wrote a working markDefs. critson: Agreed, this should go away. It was an early hack before I wrote a working markDefs.
// registers across basic blocks.		// registers across basic blocks.
if (Value->isPHIDef())		if (Value->isPHIDef())
continue;		continue;

markInstruction(*LIS->getInstructionFromIndex(Value->def), Flag,		markInstruction(*LIS->getInstructionFromIndex(Value->def), Flag,
Worklist);		Worklist);
		} else {
		markDefs(MI, LR, *RegUnit, AMDGPU::NoSubRegister, Flag, Worklist);
		}
		arsenmUnsubmitted Done Reply Inline Actions Isn't this pass required to be post-SSA if it's after the scheduler? arsenm: Isn't this pass required to be post-SSA if it's after the scheduler?
		critsonAuthorUnsubmitted Done Reply Inline Actions I am currently retaining the ability to run in both SSA and non-SSA modes. critson: I am currently retaining the ability to run in both SSA and non-SSA modes.
}		}

continue;		continue;
}		}

		if (MRI->isSSA()) {
for (MachineInstr &DefMI : MRI->def_instructions(Use.getReg()))		for (MachineInstr &DefMI : MRI->def_instructions(Use.getReg()))
markInstruction(DefMI, Flag, Worklist);		markInstruction(DefMI, Flag, Worklist);
		} else {
		LiveRange &LR = LIS->getInterval(Reg);
		markDefs(MI, LR, Reg, Use.getSubReg(), Flag, Worklist);
		}
}		}
}		}

// Scan instructions to determine which ones require an Exact execmask and		// Scan instructions to determine which ones require an Exact execmask and
// which ones seed WQM requirements.		// which ones seed WQM requirements.
char SIWholeQuadMode::scanInstructions(MachineFunction &MF,		char SIWholeQuadMode::scanInstructions(MachineFunction &MF,
std::vector<WorkItem> &Worklist) {		std::vector<WorkItem> &Worklist) {
char GlobalFlags = 0;		char GlobalFlags = 0;
▲ Show 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	if (!S)
break;		break;

if (PreferLast) {		if (PreferLast) {
SlotIndex Next = S->start.getBaseIndex();		SlotIndex Next = S->start.getBaseIndex();
if (Next < FirstIdx)		if (Next < FirstIdx)
break;		break;
Idx = Next;		Idx = Next;
} else {		} else {
SlotIndex Next = S->end.getNextIndex().getBaseIndex();		MachineInstr *EndMI = LIS->getInstructionFromIndex(S->end.getBaseIndex());
		assert(EndMI && "Segment does not end on valid instruction");
		auto NextI = std::next(EndMI->getIterator());
		SlotIndex Next = LIS->getInstructionIndex(*NextI);
if (Next > LastIdx)		if (Next > LastIdx)
		mceierUnsubmitted Done Reply Inline Actions Is this really correct ? CS:GO crashes with git version of llvm at line 657 (on Radeon 5700XT): #0 0x00007fad33ab57b0 in llvm::IndexListEntry::getIndex (this=0x0) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/include/llvm/CodeGen/SlotIndexes.h:58 #1 0x00007fad33ab581d in llvm::SlotIndex::getIndex (this=0x7fad2ffa2960) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/include/llvm/CodeGen/SlotIndexes.h:125 #2 0x00007fad33ab591d in llvm::SlotIndex::operator> (this=0x7fad2ffa2960, other=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/include/llvm/CodeGen/SlotIndexes.h:187 #3 0x00007fad362e842c in (anonymous namespace)::SIWholeQuadMode::prepareInsertion (this=0x1460000, MBB=..., First=..., Last=..., PreferLast=false, SaveSCC=true) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:657 #4 0x00007fad362e923c in (anonymous namespace)::SIWholeQuadMode::processBlock (this=0x1460000, MBB=..., LiveMaskReg=2147484277, isEntry=true) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:859 #5 0x00007fad362ea140 in (anonymous namespace)::SIWholeQuadMode::runOnMachineFunction (this=0x1460000, MF=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp:1055 #6 0x00007fad33d15048 in llvm::MachineFunctionPass::runOnFunction (this=0x1460000, F=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/CodeGen/MachineFunctionPass.cpp:73 #7 0x00007fad3391406a in llvm::FPPassManager::runOnFunction (this=0x148ba80, F=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/IR/LegacyPassManager.cpp:1519 #8 0x00007fad352ab96b in (anonymous namespace)::CGPassManager::RunPassOnSCC (this=0x148bc40, P=0x148ba80, CurSCC=..., CG=..., CallGraphUpToDate=@0x7fad2ffa2d8d: true, DevirtualizedCall=@0x7fad2ffa2e30: false) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Analysis/CallGraphSCCPass.cpp:178 #9 0x00007fad352ac4ed in (anonymous namespace)::CGPassManager::RunAllPassesOnSCC (this=0x148bc40, CurSCC=..., CG=..., DevirtualizedCall=@0x7fad2ffa2e30: false) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Analysis/CallGraphSCCPass.cpp:476 #10 0x00007fad352ac7e4 in (anonymous namespace)::CGPassManager::runOnModule (this=0x148bc40, M=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/Analysis/CallGraphSCCPass.cpp:541 #11 0x00007fad339147f6 in (anonymous namespace)::MPPassManager::runOnModule (this=0x1412800, M=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/IR/LegacyPassManager.cpp:1634 #12 0x00007fad3390f6ce in llvm::legacy::PassManagerImpl::run (this=0x1430200, M=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/IR/LegacyPassManager.cpp:615 #13 0x00007fad33915087 in llvm::legacy::PassManager::run (this=0x1427bb8, M=...) at /var/tmp/portage/sys-devel/llvm-12.0.0.9999-r1/work/llvm/lib/IR/LegacyPassManager.cpp:1761 #14 0x00007fad3abc7145 in ac_compile_module_to_elf (p=0x1427b60, module=0x35af9300, pelf_buffer=0x111a5ad0, pelf_size=0x111a5ad8) at ../mesa-9999/src/amd/llvm/ac_llvm_helper.cpp:259 #15 0x00007fad3aad8501 in si_compile_llvm (sscreen=0xc23400, binary=0x111a5ad0, conf=0x111a5ae8, compiler=0xc23cb0, ac=0x7fad2ffa3490, debug=0x2577a030, stage=MESA_SHADER_FRAGMENT, name=0x7fad3ad7582a "Pixel Shader", less_optimized=false) at ../mesa-9999/src/gallium/drivers/radeonsi/si_shader_llvm.c:104 #16 0x00007fad3aad61d2 in si_llvm_compile_shader (sscreen=0xc23400, compiler=0xc23cb0, shader=0x111a5a00, debug=0x2577a030, nir=0xc5a6450, free_nir=false) at ../mesa-9999/src/gallium/drivers/radeonsi/si_shader.c:1891 #17 0x00007fad3aad634d in si_compile_shader (sscreen=0xc23400, compiler=0xc23cb0, shader=0x111a5a00, debug=0x2577a030) at ../mesa-9999/src/gallium/drivers/radeonsi/si_shader.c:1927 #18 0x00007fad3ab109c3 in si_init_shader_selector_async (job=0x2577a000, thread_index=0) at ../mesa-9999/src/gallium/drivers/radeonsi/si_state_shaders.c:2492 #19 0x00007fad3a73196b in util_queue_thread_func (input=0xc34ed0) at ../mesa-9999/src/util/u_queue.c:304 #20 0x00007fad3a730acb in impl_thrd_routine (p=0xc34ec0) at ../mesa-9999/include/c11/threads_posix.h:87 #21 0x00007fad3fbcbf9e in start_thread (arg=0x7fad2ffa5640) at pthread_create.c:463 #22 0x00007fad3ff4865f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 mceier: Is this really correct ? CS:GO crashes with git version of llvm at line 657 (on Radeon 5700XT)…
		critsonAuthorUnsubmitted Done Reply Inline Actions Thanks for bringing this to my attention. Sorry for the slow response. I am currently investigating. Do you happen to have any more details for reproduction? critson: Thanks for bringing this to my attention. Sorry for the slow response. I am currently…
		mceierUnsubmitted Done Reply Inline Actions The crash happens at CS:GO startup and unfortunately besides this and the stacktrace I don't have more details about it; I don't know how to extract the shader that triggers this crash (not even sure if it will help with anything). So far only CS:GO triggers it. I wonder if the move of "insertPass" from line 1000 to 993 might be the culprit ? If so I could rebuild the llvm (I didn't do it yet because it takes a lot of time) and verify this. mceier: The crash happens at CS:GO startup and unfortunately besides this and the stacktrace I don't…
		mceierUnsubmitted Done Reply Inline Actions I tried changing the order of passes and CS:GO still crashed. Also I was able to dump LLVM IR of the shader (by setting AMD_DEBUG=ps,vs): shader4.txt.gz6 KBDownload It's always shader 387 that crashes. Hopefully it will help you debug this. mceier: I tried changing the order of passes and CS:GO still crashed. Also I was able to dump LLVM IR…
		mceierUnsubmitted Done Reply Inline Actions After commenting out top 2 lines, llc produces the same stacktrace as CS:GO. mceier: After commenting out top 2 lines, llc produces the same stacktrace as CS:GO.
		critsonAuthorUnsubmitted Done Reply Inline Actions Thanks for this shader. Can you share you llc command line options as I was unable to get a crash feeding the shader to: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs critson: Thanks for this shader. Can you share you llc command line options as I was unable to get a…
		mceierUnsubmitted Done Reply Inline Actions Well, I just renamed shader4.txt to shader4.ll, commented out 2 top lines and ran llc shader4.ll and llc crashed. I didn't use any llc options. "llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs shader4.ll" also doesn't crash here. LLVM commit is f738aee0bbf39d11b9f0104e094c7893ffca040c llc --version shows: LLVM (http://llvm.org/): LLVM version 12.0.0git Optimized build. Default target: x86_64-pc-linux-gnu Host CPU: skylake Registered Targets: aarch64 - AArch64 (little endian) aarch64_32 - AArch64 (little endian ILP32) aarch64_be - AArch64 (big endian) amdgcn - AMD GCN GPUs arm - ARM arm64 - ARM64 (little endian) arm64_32 - ARM64 (little endian ILP32) armeb - ARM (big endian) bpf - BPF (host endian) bpfeb - BPF (big endian) bpfel - BPF (little endian) nvptx - NVIDIA PTX 32-bit nvptx64 - NVIDIA PTX 64-bit r600 - AMD GPUs HD2XXX-HD6XXX riscv32 - 32-bit RISC-V riscv64 - 64-bit RISC-V thumb - Thumb thumbeb - Thumb (big endian) wasm32 - WebAssembly 32-bit wasm64 - WebAssembly 64-bit x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 mceier: Well, I just renamed shader4.txt to shader4.ll, commented out 2 top lines and ran llc shader4.
		mceierUnsubmitted Done Reply Inline Actions llc crashes only for -mcpu=generic and -mcpu=generic-hsa btw. shader4.txt.gz on phabricator seems to be gzipped twice (I uploaded gzipped file and I didn't know phabricator will gzip it again) and has to be ungzipped twice. mceier: llc crashes only for -mcpu=generic and -mcpu=generic-hsa btw. shader4.txt.gz on phabricator…
		critsonAuthorUnsubmitted Done Reply Inline Actions Thanks. I have created D90997 which hopefully should address the issue. critson: Thanks. I have created D90997 which hopefully should address the issue.
break;		break;
Idx = Next;		Idx = Next;
}		}
}		}

MachineBasicBlock::iterator MBBI;		MachineBasicBlock::iterator MBBI;

if (MachineInstr *MI = LIS->getInstructionFromIndex(Idx))		if (MachineInstr *MI = LIS->getInstructionFromIndex(Idx))
MBBI = MI;		MBBI = MI;
else {		else {
assert(Idx == LIS->getMBBEndIdx(&MBB));		assert(Idx == LIS->getMBBEndIdx(&MBB));
MBBI = MBB.end();		MBBI = MBB.end();
}		}

		// Move insertion point past any operations modifying EXEC.
		// This assumes that the value of SCC defined by any of these operations
		// does not need to be preserved.
		while (MBBI != Last) {
		bool IsExecDef = false;
		for (const MachineOperand &MO : MBBI->operands()) {
		if (MO.isReg() && MO.isDef()) {
		IsExecDef \|=
		MO.getReg() == AMDGPU::EXEC_LO \|\| MO.getReg() == AMDGPU::EXEC;
		}
		}
		if (!IsExecDef)
		break;
		MBBI++;
		S = nullptr;
		}

if (S)		if (S)
MBBI = saveSCC(MBB, MBBI);		MBBI = saveSCC(MBB, MBBI);

return MBBI;		return MBBI;
}		}

void SIWholeQuadMode::toExact(MachineBasicBlock &MBB,		void SIWholeQuadMode::toExact(MachineBasicBlock &MBB,
MachineBasicBlock::iterator Before,		MachineBasicBlock::iterator Before,
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	void SIWholeQuadMode::processBlock(MachineBasicBlock &MBB, unsigned LiveMaskReg,
unsigned SavedWQMReg = 0;		unsigned SavedWQMReg = 0;
unsigned SavedNonWWMReg = 0;		unsigned SavedNonWWMReg = 0;
bool WQMFromExec = isEntry;		bool WQMFromExec = isEntry;
char State = (isEntry \|\| !(BI.InNeeds & StateWQM)) ? StateExact : StateWQM;		char State = (isEntry \|\| !(BI.InNeeds & StateWQM)) ? StateExact : StateWQM;
char NonWWMState = 0;		char NonWWMState = 0;
const TargetRegisterClass *BoolRC = TRI->getBoolRC();		const TargetRegisterClass *BoolRC = TRI->getBoolRC();

auto II = MBB.getFirstNonPHI(), IE = MBB.end();		auto II = MBB.getFirstNonPHI(), IE = MBB.end();
if (isEntry)		if (isEntry) {
++II; // Skip the instruction that saves LiveMask		// Skip the instruction that saves LiveMask
		if (II != IE && II->getOpcode() == AMDGPU::COPY)
		++II;
		}

// This stores the first instruction where it's safe to switch from WQM to		// This stores the first instruction where it's safe to switch from WQM to
// Exact or vice versa.		// Exact or vice versa.
MachineBasicBlock::iterator FirstWQM = IE;		MachineBasicBlock::iterator FirstWQM = IE;

// This stores the first instruction where it's safe to switch from WWM to		// This stores the first instruction where it's safe to switch from WWM to
// Exact/WQM or to switch to WWM. It must always be the same as, or after,		// Exact/WQM or to switch to WWM. It must always be the same as, or after,
// FirstWQM since if it's safe to switch to/from WWM, it must be safe to		// FirstWQM since if it's safe to switch to/from WWM, it must be safe to
// switch to/from WQM as well.		// switch to/from WQM as well.
MachineBasicBlock::iterator FirstWWM = IE;		MachineBasicBlock::iterator FirstWWM = IE;

for (;;) {		for (;;) {
MachineBasicBlock::iterator Next = II;		MachineBasicBlock::iterator Next = II;
char Needs = StateExact \| StateWQM; // WWM is disabled by default		char Needs = StateExact \| StateWQM; // WWM is disabled by default
char OutNeeds = 0;		char OutNeeds = 0;

if (FirstWQM == IE)		if (FirstWQM == IE)
FirstWQM = II;		FirstWQM = II;

Show All 20 Lines	if (II != IE) {
// If the instruction doesn't actually need a correct EXEC, then we can		// If the instruction doesn't actually need a correct EXEC, then we can
// safely leave WWM enabled.		// safely leave WWM enabled.
Needs = StateExact \| StateWQM \| StateWWM;		Needs = StateExact \| StateWQM \| StateWWM;
}		}

if (MI.isTerminator() && OutNeeds == StateExact)		if (MI.isTerminator() && OutNeeds == StateExact)
Needs = StateExact;		Needs = StateExact;

++Next;		++Next;
} else {		} else {
// End of basic block		// End of basic block
if (BI.OutNeeds & StateWQM)		if (BI.OutNeeds & StateWQM)
Needs = StateWQM;		Needs = StateWQM;
else if (BI.OutNeeds == StateExact)		else if (BI.OutNeeds == StateExact)
Needs = StateExact;		Needs = StateExact;
else		else
Needs = StateWQM \| StateExact;		Needs = StateWQM \| StateExact;
}		}

// Now, transition if necessary.		// Now, transition if necessary.
if (!(Needs & State)) {		if (!(Needs & State)) {
MachineBasicBlock::iterator First;		MachineBasicBlock::iterator First;
if (State == StateWWM \|\| Needs == StateWWM) {		if (State == StateWWM \|\| Needs == StateWWM) {
// We must switch to or from WWM		// We must switch to or from WWM
First = FirstWWM;		First = FirstWWM;
		nhaehnleUnsubmitted Done Reply Inline Actions This looks suspicious. Can you please explain what is happening here? nhaehnle: This looks suspicious. Can you please explain what is happening here?
		critsonAuthorUnsubmitted Done Reply Inline Actions This is analogous to the code above that modifies the SI_ELSE to make it respect the EXEC mask -- it is a very special case match and fix up based on current code generation so I do not like it either. If we make SI_ELSE always respect modifications to the EXEC mask then this can go away. Then perhaps we add a late peephole to clear up some of the unnecessary instructions when they are not required. Do you have an opinion on this? critson: This is analogous to the code above that modifies the SI_ELSE to make it respect the EXEC mask…
		nhaehnleUnsubmitted Done Reply Inline Actions We really shouldn't rely on such details of current code generation here. I think you're on to something. Digging into this more... SI_ELSE is currently lowered as: s_or_saveexec_bNN dst, src ... s_xor_bNN_term exec, exec, dst What if we instead lowered it as: s_or_saveexec_bNN tmp, src ... s_and_bNN_term dst, exec, tmp s_xor_bNN_term exec, exec, dst One of the OptimizeExecMasking passes can then just remove the s_and_bNN_term if there is no modification of exec in the middle. I think I'd be happy about that solution. In practice, the scan backwards from s_and_bNN isn't that expensive and I believe it's required anyway. Thoughts? nhaehnle: We really shouldn't rely on such details of current code generation here. I think you're on to…
		critsonAuthorUnsubmitted Done Reply Inline Actions Yes, your solution is what I was trying to suggest. Your instead case is what happens when we do MI.getOperand(3).setImm(1). I will put it in as a separate Phabricator review shortly to simplify SI_ELSE lowering and optimise out the unnecessary s_and in OptimizeExecMasking. critson: Yes, your solution is what I was trying to suggest. Your instead case is what happens when we…
		nhaehnleUnsubmitted Done Reply Inline Actions Sounds good! nhaehnle: Sounds good!
} else {		} else {
// We only need to switch to/from WQM, so we can use FirstWQM		// We only need to switch to/from WQM, so we can use FirstWQM
First = FirstWQM;		First = FirstWQM;
}		}

MachineBasicBlock::iterator Before =		MachineBasicBlock::iterator Before =
prepareInsertion(MBB, First, II, Needs == StateWQM,		prepareInsertion(MBB, First, II, Needs == StateWQM,
Needs == StateExact \|\| WQMFromExec);		Needs == StateExact \|\| WQMFromExec);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	for (;;) {
if (Needs != (StateExact \| StateWQM \| StateWWM)) {		if (Needs != (StateExact \| StateWQM \| StateWWM)) {
if (Needs != (StateExact \| StateWQM))		if (Needs != (StateExact \| StateWQM))
FirstWQM = IE;		FirstWQM = IE;
FirstWWM = IE;		FirstWWM = IE;
}		}

if (II == IE)		if (II == IE)
break;		break;

II = Next;		II = Next;
}		}
assert(!SavedWQMReg);		assert(!SavedWQMReg);
assert(!SavedNonWWMReg);		assert(!SavedNonWWMReg);
}		}

void SIWholeQuadMode::lowerLiveMaskQueries(unsigned LiveMaskReg) {		void SIWholeQuadMode::lowerLiveMaskQueries(unsigned LiveMaskReg) {
for (MachineInstr *MI : LiveMaskQueries) {		for (MachineInstr *MI : LiveMaskQueries) {
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();
Register Dest = MI->getOperand(0).getReg();		Register Dest = MI->getOperand(0).getReg();

MachineInstr *Copy =		MachineInstr *Copy =
BuildMI(*MI->getParent(), MI, DL, TII->get(AMDGPU::COPY), Dest)		BuildMI(*MI->getParent(), MI, DL, TII->get(AMDGPU::COPY), Dest)
.addReg(LiveMaskReg);		.addReg(LiveMaskReg);

LIS->ReplaceMachineInstrInMaps(MI, Copy);		LIS->ReplaceMachineInstrInMaps(MI, Copy);
MI->eraseFromParent();		MI->eraseFromParent();
}		}
}		}

void SIWholeQuadMode::lowerCopyInstrs() {		void SIWholeQuadMode::lowerCopyInstrs() {
for (MachineInstr *MI : LowerToMovInstrs) {		for (MachineInstr *MI : LowerToMovInstrs) {
assert(MI->getNumExplicitOperands() == 2);		assert(MI->getNumExplicitOperands() == 2);

const Register Reg = MI->getOperand(0).getReg();		const Register Reg = MI->getOperand(0).getReg();
		const unsigned SubReg = MI->getOperand(0).getSubReg();

if (TRI->isVGPR(*MRI, Reg)) {		if (TRI->isVGPR(*MRI, Reg)) {
const TargetRegisterClass *regClass =		const TargetRegisterClass *regClass =
Reg.isVirtual() ? MRI->getRegClass(Reg) : TRI->getPhysRegClass(Reg);		Reg.isVirtual() ? MRI->getRegClass(Reg) : TRI->getPhysRegClass(Reg);
		if (SubReg)
		regClass = TRI->getSubRegClass(regClass, SubReg);

const unsigned MovOp = TII->getMovOpcode(regClass);		const unsigned MovOp = TII->getMovOpcode(regClass);
MI->setDesc(TII->get(MovOp));		MI->setDesc(TII->get(MovOp));

// And make it implicitly depend on exec (like all VALU movs should do).		// And make it implicitly depend on exec (like all VALU movs should do).
MI->addOperand(MachineOperand::CreateReg(AMDGPU::EXEC, false, true));		MI->addOperand(MachineOperand::CreateReg(AMDGPU::EXEC, false, true));
} else {		} else if (!MRI->isSSA()) {
		// Remove early-clobber and exec dependency from simple SGPR copies.
		// This allows some to be eliminated during/post RA.
		LLVM_DEBUG(dbgs() << "simplify SGPR copy: " << *MI);
		if (MI->getOperand(0).isEarlyClobber()) {
		LIS->removeInterval(Reg);
		MI->getOperand(0).setIsEarlyClobber(false);
		LIS->createAndComputeVirtRegInterval(Reg);
		}
		int Index = MI->findRegisterUseOperandIdx(AMDGPU::EXEC);
		while (Index >= 0) {
		MI->RemoveOperand(Index);
		Index = MI->findRegisterUseOperandIdx(AMDGPU::EXEC);
		}
MI->setDesc(TII->get(AMDGPU::COPY));		MI->setDesc(TII->get(AMDGPU::COPY));
		LLVM_DEBUG(dbgs() << " -> " << *MI);
}		}
}		}
for (MachineInstr *MI : LowerToCopyInstrs) {		for (MachineInstr *MI : LowerToCopyInstrs) {
if (MI->getOpcode() == AMDGPU::V_SET_INACTIVE_B32 \|\|		if (MI->getOpcode() == AMDGPU::V_SET_INACTIVE_B32 \|\|
MI->getOpcode() == AMDGPU::V_SET_INACTIVE_B64) {		MI->getOpcode() == AMDGPU::V_SET_INACTIVE_B64) {
assert(MI->getNumExplicitOperands() == 3);		assert(MI->getNumExplicitOperands() == 3);
// the only reason we should be here is V_SET_INACTIVE has		// the only reason we should be here is V_SET_INACTIVE has
// an undef input so it is being replaced by a simple copy.		// an undef input so it is being replaced by a simple copy.
// There should be a second undef source that we should remove.		// There should be a second undef source that we should remove.
assert(MI->getOperand(2).isUndef());		assert(MI->getOperand(2).isUndef());
MI->RemoveOperand(2);		MI->RemoveOperand(2);
MI->untieRegOperand(1);		MI->untieRegOperand(1);
} else {		} else {
assert(MI->getNumExplicitOperands() == 2);		assert(MI->getNumExplicitOperands() == 2);
		nhaehnleUnsubmitted Done Reply Inline Actions This is incorrect, there could be other users of the value. Just keep the simpler case below. nhaehnle: This is incorrect, there could be other users of the value. Just keep the simpler case below.
		critsonAuthorUnsubmitted Done Reply Inline Actions There should not be other users of the value, it is a kill? I am not going to fight to keep this, but we would benefit from more late clean up of unnecessary copies. I guess this ties into some of the things I am touching on in D89187, so the follow up to that might solve this. critson: There should not be other users of the value, it is a kill? I am not going to fight to keep…
		nhaehnleUnsubmitted Done Reply Inline Actions I appreciate the desire to remove some unnecessary copies, but let's first figure out whether this one is correct. Specifically, I thought LRQ.isKill() only means that the use of SrcReg in MI is the _last_ use. There could be other uses of the same definition of SrcReg that come earlier, right? So maybe you could still eliminate the copy here if you updated those other uses as well. I would still ask you to keep things simpler here for this change and see if you can find a good place to eliminate this kind of copy separately in a dedicated pass. This code is quite difficult to follow even without this. nhaehnle: I appreciate the desire to remove some unnecessary copies, but let's first figure out whether…
		critsonAuthorUnsubmitted Done Reply Inline Actions OK, yep I see that there /could/ be other users, although in practice I had not encountered them. I will work on cleaning this up in a later pass. critson: OK, yep I see that there /could/ be other users, although in practice I had not encountered…
		nhaehnleUnsubmitted Not Done Reply Inline Actions What do you mean by cleaning this up in a later pass? The goal should be to keep the MachineIR an accurate representation of the program at all times. nhaehnle: What do you mean by cleaning this up in a later pass? The goal should be to keep the MachineIR…
		critsonAuthorUnsubmitted Done Reply Inline Actions MachineIR is accurate. My point is because the pass now runs later there is nothing to optimise away trivial copies it introduces when lowering WWM operations. See cruft this adds in atomic tests, e.g. atomic_optimizations_buffer.ll critson: MachineIR is accurate. My point is because the pass now runs later there is nothing to optimise…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Ah, I see. Maybe that cleanup could be done as a follow-up change. nhaehnle: Ah, I see. Maybe that cleanup could be done as a follow-up change.
}		}

MI->setDesc(TII->get(AMDGPU::COPY));		MI->setDesc(TII->get(AMDGPU::COPY));
}		}
}		}

bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {		bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
Instructions.clear();		Instructions.clear();
Blocks.clear();		Blocks.clear();
LiveMaskQueries.clear();		LiveMaskQueries.clear();
LowerToCopyInstrs.clear();		LowerToCopyInstrs.clear();
LowerToMovInstrs.clear();		LowerToMovInstrs.clear();
CallingConv = MF.getFunction().getCallingConv();		CallingConv = MF.getFunction().getCallingConv();

ST = &MF.getSubtarget<GCNSubtarget>();		ST = &MF.getSubtarget<GCNSubtarget>();

TII = ST->getInstrInfo();		TII = ST->getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
LIS = &getAnalysis<LiveIntervals>();		LIS = &getAnalysis<LiveIntervals>();

		if (ST->isWave32()) {
		AndOpc = AMDGPU::S_AND_B32;
		XorTermrOpc = AMDGPU::S_XOR_B32_term;
		OrSaveExecOpc = AMDGPU::S_OR_SAVEEXEC_B32;
		Exec = AMDGPU::EXEC_LO;
		} else {
		AndOpc = AMDGPU::S_AND_B64;
		XorTermrOpc = AMDGPU::S_XOR_B64_term;
		OrSaveExecOpc = AMDGPU::S_OR_SAVEEXEC_B64;
		Exec = AMDGPU::EXEC;
		}

char GlobalFlags = analyzeFunction(MF);		char GlobalFlags = analyzeFunction(MF);
unsigned LiveMaskReg = 0;		unsigned LiveMaskReg = 0;
unsigned Exec = ST->isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
if (!(GlobalFlags & StateWQM)) {		if (!(GlobalFlags & StateWQM)) {
lowerLiveMaskQueries(Exec);		lowerLiveMaskQueries(Exec);
if (!(GlobalFlags & StateWWM) && LowerToCopyInstrs.empty() && LowerToMovInstrs.empty())		if (!(GlobalFlags & StateWWM) && LowerToCopyInstrs.empty() && LowerToMovInstrs.empty())
return !LiveMaskQueries.empty();		return !LiveMaskQueries.empty();
} else {		} else {
// Store a copy of the original live mask when required		// Store a copy of the original live mask when required
MachineBasicBlock &Entry = MF.front();		MachineBasicBlock &Entry = MF.front();
MachineBasicBlock::iterator EntryMI = Entry.getFirstNonPHI();		MachineBasicBlock::iterator EntryMI = Entry.getFirstNonPHI();
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.a16.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s			; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s
	; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -o - %s \| FileCheck -check-prefix=GFX10NSA %s			; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -o - %s \| FileCheck -check-prefix=GFX10NSA %s

	define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; GFX9-LABEL: gather4_2d:			; GFX9-LABEL: gather4_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_mov_b32_e32 v2, 0xffff
				; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_mov_b32 s1, s3
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5			; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7			; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9			; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11			; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13			; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX9-NEXT: v_and_or_b32 v0, v0, v2, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, v2, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_2d:			; GFX10NSA-LABEL: gather4_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX10NSA-NEXT: v_and_or_b32 v0, v0, 0xffff, v1			; GFX10NSA-NEXT: v_and_or_b32 v0, v0, 0xffff, v1
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10NSA-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32 1, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32 1, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {			define amdgpu_ps <4 x float> @gather4_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {
	; GFX9-LABEL: gather4_cube:			; GFX9-LABEL: gather4_cube:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX9-NEXT: s_lshl_b32 s12, s0, 16			; GFX9-NEXT: s_lshl_b32 s12, s0, 16
				; GFX9-NEXT: s_mov_b32 s1, s3
				; GFX9-NEXT: s_mov_b32 s3, s5
				; GFX9-NEXT: s_mov_b32 s5, s7
				; GFX9-NEXT: s_mov_b32 s7, s9
				; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: v_and_or_b32 v0, v0, v3, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, v3, v1
				; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: v_and_or_b32 v1, v2, v3, s12			; GFX9-NEXT: v_and_or_b32 v1, v2, v3, s12
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16 da			; GFX9-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16 da
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_cube:			; GFX10NSA-LABEL: gather4_cube:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_mov_b32_e32 v3, 0xffff
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_mov_b32_e32 v3, 0xffff
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16			; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
				; GFX10NSA-NEXT: s_mov_b32 s1, s3
				; GFX10NSA-NEXT: s_mov_b32 s3, s5
				; GFX10NSA-NEXT: s_mov_b32 s5, s7
				; GFX10NSA-NEXT: s_mov_b32 s7, s9
				; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v3, v1			; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v3, v1
				; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: v_and_or_b32 v1, v2, v3, s12			; GFX10NSA-NEXT: v_and_or_b32 v1, v2, v3, s12
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_CUBE a16			; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_CUBE a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32 1, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32 1, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {			define amdgpu_ps <4 x float> @gather4_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {
	; GFX9-LABEL: gather4_2darray:			; GFX9-LABEL: gather4_2darray:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX9-NEXT: s_lshl_b32 s12, s0, 16			; GFX9-NEXT: s_lshl_b32 s12, s0, 16
				; GFX9-NEXT: s_mov_b32 s1, s3
				; GFX9-NEXT: s_mov_b32 s3, s5
				; GFX9-NEXT: s_mov_b32 s5, s7
				; GFX9-NEXT: s_mov_b32 s7, s9
				; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: v_and_or_b32 v0, v0, v3, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, v3, v1
				; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: v_and_or_b32 v1, v2, v3, s12			; GFX9-NEXT: v_and_or_b32 v1, v2, v3, s12
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16 da			; GFX9-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16 da
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_2darray:			; GFX10NSA-LABEL: gather4_2darray:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_mov_b32_e32 v3, 0xffff
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_mov_b32_e32 v3, 0xffff
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16			; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
				; GFX10NSA-NEXT: s_mov_b32 s1, s3
				; GFX10NSA-NEXT: s_mov_b32 s3, s5
				; GFX10NSA-NEXT: s_mov_b32 s5, s7
				; GFX10NSA-NEXT: s_mov_b32 s7, s9
				; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v3, v1			; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v3, v1
				; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: v_and_or_b32 v1, v2, v3, s12			; GFX10NSA-NEXT: v_and_or_b32 v1, v2, v3, s12
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D_ARRAY a16			; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D_ARRAY a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32 1, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32 1, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; GFX9-LABEL: gather4_c_2d:			; GFX9-LABEL: gather4_c_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff
				; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_mov_b32 s1, s3
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5			; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7			; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9			; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11			; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13			; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX9-NEXT: v_and_or_b32 v1, v1, v3, v2			; GFX9-NEXT: v_and_or_b32 v1, v1, v3, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_c_2d:			; GFX10NSA-LABEL: gather4_c_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX10NSA-NEXT: v_and_or_b32 v1, v1, 0xffff, v2			; GFX10NSA-NEXT: v_and_or_b32 v1, v1, 0xffff, v2
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10NSA-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f16(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f16(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_cl_2d:			; GFX9-LABEL: gather4_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; GFX9-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX9-NEXT: s_lshl_b32 s12, s0, 16			; GFX9-NEXT: s_lshl_b32 s12, s0, 16
				; GFX9-NEXT: s_mov_b32 s1, s3
				; GFX9-NEXT: s_mov_b32 s3, s5
				; GFX9-NEXT: s_mov_b32 s5, s7
				; GFX9-NEXT: s_mov_b32 s7, s9
				; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: v_and_or_b32 v0, v0, v3, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, v3, v1
				; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: v_and_or_b32 v1, v2, v3, s12			; GFX9-NEXT: v_and_or_b32 v1, v2, v3, s12
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_cl_2d:			; GFX10NSA-LABEL: gather4_cl_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_mov_b32_e32 v3, 0xffff
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_mov_b32_e32 v3, 0xffff
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16			; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
				; GFX10NSA-NEXT: s_mov_b32 s1, s3
				; GFX10NSA-NEXT: s_mov_b32 s3, s5
				; GFX10NSA-NEXT: s_mov_b32 s5, s7
				; GFX10NSA-NEXT: s_mov_b32 s7, s9
				; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v3, v1			; GFX10NSA-NEXT: v_and_or_b32 v0, v0, v3, v1
				; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: v_and_or_b32 v1, v2, v3, s12			; GFX10NSA-NEXT: v_and_or_b32 v1, v2, v3, s12
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10NSA-NEXT: image_gather4_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32 1, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32 1, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_c_cl_2d:			; GFX9-LABEL: gather4_c_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2			; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX9-NEXT: s_lshl_b32 s12, s0, 16			; GFX9-NEXT: s_lshl_b32 s12, s0, 16
				; GFX9-NEXT: s_mov_b32 s1, s3
				; GFX9-NEXT: s_mov_b32 s3, s5
				; GFX9-NEXT: s_mov_b32 s5, s7
				; GFX9-NEXT: s_mov_b32 s7, s9
				; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: v_and_or_b32 v1, v1, v4, v2			; GFX9-NEXT: v_and_or_b32 v1, v1, v4, v2
				; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: v_and_or_b32 v2, v3, v4, s12			; GFX9-NEXT: v_and_or_b32 v2, v3, v4, s12
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_c_cl_2d:			; GFX10NSA-LABEL: gather4_c_cl_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_mov_b32_e32 v4, 0xffff
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_mov_b32_e32 v4, 0xffff
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16			; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
				; GFX10NSA-NEXT: s_mov_b32 s1, s3
				; GFX10NSA-NEXT: s_mov_b32 s3, s5
				; GFX10NSA-NEXT: s_mov_b32 s5, s7
				; GFX10NSA-NEXT: s_mov_b32 s7, s9
				; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: v_and_or_b32 v1, v1, v4, v2			; GFX10NSA-NEXT: v_and_or_b32 v1, v1, v4, v2
				; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: v_and_or_b32 v2, v3, v4, s12			; GFX10NSA-NEXT: v_and_or_b32 v2, v3, v4, s12
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10NSA-NEXT: image_gather4_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f16(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f16(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {
	; GFX9-LABEL: gather4_b_2d:			; GFX9-LABEL: gather4_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff
				; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_mov_b32 s1, s3
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5			; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7			; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9			; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11			; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13			; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX9-NEXT: v_and_or_b32 v1, v1, v3, v2			; GFX9-NEXT: v_and_or_b32 v1, v1, v3, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_b_2d:			; GFX10NSA-LABEL: gather4_b_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX10NSA-NEXT: v_and_or_b32 v1, v1, 0xffff, v2			; GFX10NSA-NEXT: v_and_or_b32 v1, v1, 0xffff, v2
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10NSA-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {
	; GFX9-LABEL: gather4_c_b_2d:			; GFX9-LABEL: gather4_c_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
				; GFX9-NEXT: s_wqm_b64 exec, exec
				; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
				; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_mov_b32 s1, s3
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5			; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7			; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9			; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11			; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13			; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
	; GFX9-NEXT: v_and_or_b32 v2, v2, v4, v3			; GFX9-NEXT: v_and_or_b32 v2, v2, v4, v3
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_c_b_2d:			; GFX10NSA-LABEL: gather4_c_b_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v3, 16, v3
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v3, 16, v3
	; GFX10NSA-NEXT: v_and_or_b32 v2, v2, 0xffff, v3			; GFX10NSA-NEXT: v_and_or_b32 v2, v2, 0xffff, v3
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10NSA-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_b_cl_2d:			; GFX9-LABEL: gather4_b_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2			; GFX9-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX9-NEXT: s_lshl_b32 s12, s0, 16			; GFX9-NEXT: s_lshl_b32 s12, s0, 16
				; GFX9-NEXT: s_mov_b32 s1, s3
				; GFX9-NEXT: s_mov_b32 s3, s5
				; GFX9-NEXT: s_mov_b32 s5, s7
				; GFX9-NEXT: s_mov_b32 s7, s9
				; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: v_and_or_b32 v1, v1, v4, v2			; GFX9-NEXT: v_and_or_b32 v1, v1, v4, v2
				; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: v_and_or_b32 v2, v3, v4, s12			; GFX9-NEXT: v_and_or_b32 v2, v3, v4, s12
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_b_cl_2d:			; GFX10NSA-LABEL: gather4_b_cl_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_mov_b32_e32 v4, 0xffff
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_mov_b32_e32 v4, 0xffff
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16			; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
				; GFX10NSA-NEXT: s_mov_b32 s1, s3
				; GFX10NSA-NEXT: s_mov_b32 s3, s5
				; GFX10NSA-NEXT: s_mov_b32 s5, s7
				; GFX10NSA-NEXT: s_mov_b32 s7, s9
				; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: v_and_or_b32 v1, v1, v4, v2			; GFX10NSA-NEXT: v_and_or_b32 v1, v1, v4, v2
				; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: v_and_or_b32 v2, v3, v4, s12			; GFX10NSA-NEXT: v_and_or_b32 v2, v3, v4, s12
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10NSA-NEXT: image_gather4_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_c_b_cl_2d:			; GFX9-LABEL: gather4_c_b_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
				; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s0, s2			; GFX9-NEXT: s_mov_b32 s0, s2
	; GFX9-NEXT: s_mov_b32 s1, s3			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_mov_b32 s2, s4			; GFX9-NEXT: s_mov_b32 s2, s4
	; GFX9-NEXT: s_mov_b32 s3, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s6, s8			; GFX9-NEXT: s_mov_b32 s6, s8
	; GFX9-NEXT: s_mov_b32 s7, s9
	; GFX9-NEXT: s_mov_b32 s8, s10			; GFX9-NEXT: s_mov_b32 s8, s10
	; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_mov_b32 s10, s12			; GFX9-NEXT: s_mov_b32 s10, s12
	; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v5, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v5, 0xffff
	; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3			; GFX9-NEXT: v_lshlrev_b32_e32 v3, 16, v3
	; GFX9-NEXT: s_lshl_b32 s12, s0, 16			; GFX9-NEXT: s_lshl_b32 s12, s0, 16
				; GFX9-NEXT: s_mov_b32 s1, s3
				; GFX9-NEXT: s_mov_b32 s3, s5
				; GFX9-NEXT: s_mov_b32 s5, s7
				; GFX9-NEXT: s_mov_b32 s7, s9
				; GFX9-NEXT: s_mov_b32 s9, s11
	; GFX9-NEXT: v_and_or_b32 v2, v2, v5, v3			; GFX9-NEXT: v_and_or_b32 v2, v2, v5, v3
				; GFX9-NEXT: s_mov_b32 s11, s13
	; GFX9-NEXT: v_and_or_b32 v3, v4, v5, s12			; GFX9-NEXT: v_and_or_b32 v3, v4, v5, s12
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_c_b_cl_2d:			; GFX10NSA-LABEL: gather4_c_b_cl_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: v_mov_b32_e32 v5, 0xffff
				; GFX10NSA-NEXT: v_lshlrev_b32_e32 v3, 16, v3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: v_mov_b32_e32 v5, 0xffff
	; GFX10NSA-NEXT: v_lshlrev_b32_e32 v3, 16, v3
	; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16			; GFX10NSA-NEXT: s_lshl_b32 s12, s0, 16
				; GFX10NSA-NEXT: s_mov_b32 s1, s3
				; GFX10NSA-NEXT: s_mov_b32 s3, s5
				; GFX10NSA-NEXT: s_mov_b32 s5, s7
				; GFX10NSA-NEXT: s_mov_b32 s7, s9
				; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: v_and_or_b32 v2, v2, v5, v3			; GFX10NSA-NEXT: v_and_or_b32 v2, v2, v5, v3
				; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: v_and_or_b32 v3, v4, v5, s12			; GFX10NSA-NEXT: v_and_or_b32 v3, v4, v5, s12
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10NSA-NEXT: image_gather4_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {			define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {
	▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.dim.ll

	Show All 20 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_2d:			; GFX10NSA-LABEL: gather4_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 da			; GFX6-NEXT: image_gather4 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 da
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_cube:			; GFX10NSA-LABEL: gather4_cube:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_CUBE			; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_CUBE
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f32(i32 1, float %s, float %t, float %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f32(i32 1, float %s, float %t, float %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 da			; GFX6-NEXT: image_gather4 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 da
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_2darray:			; GFX10NSA-LABEL: gather4_2darray:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D_ARRAY			; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D_ARRAY
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f32(i32 1, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f32(i32 1, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_c v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_c v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_c_2d:			; GFX10NSA-LABEL: gather4_c_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_c v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4_c v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32 1, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32 1, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_cl_2d:			; GFX10NSA-LABEL: gather4_cl_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f32(i32 1, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f32(i32 1, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_c_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_c_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_c_cl_2d:			; GFX10NSA-LABEL: gather4_c_cl_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_c_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4_c_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32 1, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32 1, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_b_2d:			; GFX10NSA-LABEL: gather4_b_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f32(i32 1, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f32(i32 1, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_c_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_c_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_c_b_2d:			; GFX10NSA-LABEL: gather4_c_b_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_c_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4_c_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_b_cl_2d:			; GFX10NSA-LABEL: gather4_b_cl_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_c_b_cl v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_c_b_cl v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_c_b_cl_2d:			; GFX10NSA-LABEL: gather4_c_b_cl_2d:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4_c_b_cl v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4_c_b_cl v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x2			; GFX6-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x2
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_2d_dmask_2:			; GFX10NSA-LABEL: gather4_2d_dmask_2:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x2 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x2 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 2, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 2, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x4			; GFX6-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x4
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_2d_dmask_4:			; GFX10NSA-LABEL: gather4_2d_dmask_4:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 4, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 4, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x8			; GFX6-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x8
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10NSA-LABEL: gather4_2d_dmask_8:			; GFX10NSA-LABEL: gather4_2d_dmask_8:
	; GFX10NSA: ; %bb.0: ; %main_body			; GFX10NSA: ; %bb.0: ; %main_body
				; GFX10NSA-NEXT: s_mov_b32 s1, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s0, s2			; GFX10NSA-NEXT: s_mov_b32 s0, s2
				; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10NSA-NEXT: s_mov_b32 s1, s3			; GFX10NSA-NEXT: s_mov_b32 s1, s3
	; GFX10NSA-NEXT: s_mov_b32 s2, s4			; GFX10NSA-NEXT: s_mov_b32 s2, s4
	; GFX10NSA-NEXT: s_mov_b32 s3, s5			; GFX10NSA-NEXT: s_mov_b32 s3, s5
	; GFX10NSA-NEXT: s_mov_b32 s4, s6			; GFX10NSA-NEXT: s_mov_b32 s4, s6
	; GFX10NSA-NEXT: s_mov_b32 s5, s7			; GFX10NSA-NEXT: s_mov_b32 s5, s7
	; GFX10NSA-NEXT: s_mov_b32 s6, s8			; GFX10NSA-NEXT: s_mov_b32 s6, s8
	; GFX10NSA-NEXT: s_mov_b32 s7, s9			; GFX10NSA-NEXT: s_mov_b32 s7, s9
	; GFX10NSA-NEXT: s_mov_b32 s8, s10			; GFX10NSA-NEXT: s_mov_b32 s8, s10
	; GFX10NSA-NEXT: s_mov_b32 s9, s11			; GFX10NSA-NEXT: s_mov_b32 s9, s11
	; GFX10NSA-NEXT: s_mov_b32 s14, exec_lo
	; GFX10NSA-NEXT: s_mov_b32 s10, s12			; GFX10NSA-NEXT: s_mov_b32 s10, s12
	; GFX10NSA-NEXT: s_mov_b32 s11, s13			; GFX10NSA-NEXT: s_mov_b32 s11, s13
	; GFX10NSA-NEXT: ; implicit-def: $vcc_hi			; GFX10NSA-NEXT: ; implicit-def: $vcc_hi
	; GFX10NSA-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10NSA-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x8 dim:SQ_RSRC_IMG_2D			; GFX10NSA-NEXT: image_gather4 v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x8 dim:SQ_RSRC_IMG_2D
	; GFX10NSA-NEXT: s_waitcnt vmcnt(0)			; GFX10NSA-NEXT: s_waitcnt vmcnt(0)
	; GFX10NSA-NEXT: ; return to shader part epilog			; GFX10NSA-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 8, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 8, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.o.dim.ll

	Show All 20 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_o v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_o v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_o_2d:			; GFX10-LABEL: gather4_o_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s1, exec_lo
	; GFX10-NEXT: s_mov_b32 s0, s2			; GFX10-NEXT: s_mov_b32 s0, s2
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10-NEXT: s_mov_b32 s1, s3			; GFX10-NEXT: s_mov_b32 s1, s3
	; GFX10-NEXT: s_mov_b32 s2, s4			; GFX10-NEXT: s_mov_b32 s2, s4
	; GFX10-NEXT: s_mov_b32 s3, s5			; GFX10-NEXT: s_mov_b32 s3, s5
	; GFX10-NEXT: s_mov_b32 s4, s6			; GFX10-NEXT: s_mov_b32 s4, s6
	; GFX10-NEXT: s_mov_b32 s5, s7			; GFX10-NEXT: s_mov_b32 s5, s7
	; GFX10-NEXT: s_mov_b32 s6, s8			; GFX10-NEXT: s_mov_b32 s6, s8
	; GFX10-NEXT: s_mov_b32 s7, s9			; GFX10-NEXT: s_mov_b32 s7, s9
	; GFX10-NEXT: s_mov_b32 s8, s10			; GFX10-NEXT: s_mov_b32 s8, s10
	; GFX10-NEXT: s_mov_b32 s9, s11			; GFX10-NEXT: s_mov_b32 s9, s11
	; GFX10-NEXT: s_mov_b32 s14, exec_lo
	; GFX10-NEXT: s_mov_b32 s10, s12			; GFX10-NEXT: s_mov_b32 s10, s12
	; GFX10-NEXT: s_mov_b32 s11, s13			; GFX10-NEXT: s_mov_b32 s11, s13
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10-NEXT: image_gather4_o v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_o v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.o.2d.v4f32.f32(i32 1, i32 %offset, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.o.2d.v4f32.f32(i32 1, i32 %offset, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_c_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_c_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_o_2d:			; GFX10-LABEL: gather4_c_o_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s1, exec_lo
	; GFX10-NEXT: s_mov_b32 s0, s2			; GFX10-NEXT: s_mov_b32 s0, s2
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10-NEXT: s_mov_b32 s1, s3			; GFX10-NEXT: s_mov_b32 s1, s3
	; GFX10-NEXT: s_mov_b32 s2, s4			; GFX10-NEXT: s_mov_b32 s2, s4
	; GFX10-NEXT: s_mov_b32 s3, s5			; GFX10-NEXT: s_mov_b32 s3, s5
	; GFX10-NEXT: s_mov_b32 s4, s6			; GFX10-NEXT: s_mov_b32 s4, s6
	; GFX10-NEXT: s_mov_b32 s5, s7			; GFX10-NEXT: s_mov_b32 s5, s7
	; GFX10-NEXT: s_mov_b32 s6, s8			; GFX10-NEXT: s_mov_b32 s6, s8
	; GFX10-NEXT: s_mov_b32 s7, s9			; GFX10-NEXT: s_mov_b32 s7, s9
	; GFX10-NEXT: s_mov_b32 s8, s10			; GFX10-NEXT: s_mov_b32 s8, s10
	; GFX10-NEXT: s_mov_b32 s9, s11			; GFX10-NEXT: s_mov_b32 s9, s11
	; GFX10-NEXT: s_mov_b32 s14, exec_lo
	; GFX10-NEXT: s_mov_b32 s10, s12			; GFX10-NEXT: s_mov_b32 s10, s12
	; GFX10-NEXT: s_mov_b32 s11, s13			; GFX10-NEXT: s_mov_b32 s11, s13
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10-NEXT: image_gather4_c_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.o.2d.v4f32.f32(i32 1, i32 %offset, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.o.2d.v4f32.f32(i32 1, i32 %offset, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_cl_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_cl_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_cl_o_2d:			; GFX10-LABEL: gather4_cl_o_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s1, exec_lo
	; GFX10-NEXT: s_mov_b32 s0, s2			; GFX10-NEXT: s_mov_b32 s0, s2
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10-NEXT: s_mov_b32 s1, s3			; GFX10-NEXT: s_mov_b32 s1, s3
	; GFX10-NEXT: s_mov_b32 s2, s4			; GFX10-NEXT: s_mov_b32 s2, s4
	; GFX10-NEXT: s_mov_b32 s3, s5			; GFX10-NEXT: s_mov_b32 s3, s5
	; GFX10-NEXT: s_mov_b32 s4, s6			; GFX10-NEXT: s_mov_b32 s4, s6
	; GFX10-NEXT: s_mov_b32 s5, s7			; GFX10-NEXT: s_mov_b32 s5, s7
	; GFX10-NEXT: s_mov_b32 s6, s8			; GFX10-NEXT: s_mov_b32 s6, s8
	; GFX10-NEXT: s_mov_b32 s7, s9			; GFX10-NEXT: s_mov_b32 s7, s9
	; GFX10-NEXT: s_mov_b32 s8, s10			; GFX10-NEXT: s_mov_b32 s8, s10
	; GFX10-NEXT: s_mov_b32 s9, s11			; GFX10-NEXT: s_mov_b32 s9, s11
	; GFX10-NEXT: s_mov_b32 s14, exec_lo
	; GFX10-NEXT: s_mov_b32 s10, s12			; GFX10-NEXT: s_mov_b32 s10, s12
	; GFX10-NEXT: s_mov_b32 s11, s13			; GFX10-NEXT: s_mov_b32 s11, s13
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10-NEXT: image_gather4_cl_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_cl_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.o.2d.v4f32.f32(i32 1, i32 %offset, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.o.2d.v4f32.f32(i32 1, i32 %offset, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_c_cl_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_c_cl_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_cl_o_2d:			; GFX10-LABEL: gather4_c_cl_o_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s1, exec_lo
	; GFX10-NEXT: s_mov_b32 s0, s2			; GFX10-NEXT: s_mov_b32 s0, s2
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10-NEXT: s_mov_b32 s1, s3			; GFX10-NEXT: s_mov_b32 s1, s3
	; GFX10-NEXT: s_mov_b32 s2, s4			; GFX10-NEXT: s_mov_b32 s2, s4
	; GFX10-NEXT: s_mov_b32 s3, s5			; GFX10-NEXT: s_mov_b32 s3, s5
	; GFX10-NEXT: s_mov_b32 s4, s6			; GFX10-NEXT: s_mov_b32 s4, s6
	; GFX10-NEXT: s_mov_b32 s5, s7			; GFX10-NEXT: s_mov_b32 s5, s7
	; GFX10-NEXT: s_mov_b32 s6, s8			; GFX10-NEXT: s_mov_b32 s6, s8
	; GFX10-NEXT: s_mov_b32 s7, s9			; GFX10-NEXT: s_mov_b32 s7, s9
	; GFX10-NEXT: s_mov_b32 s8, s10			; GFX10-NEXT: s_mov_b32 s8, s10
	; GFX10-NEXT: s_mov_b32 s9, s11			; GFX10-NEXT: s_mov_b32 s9, s11
	; GFX10-NEXT: s_mov_b32 s14, exec_lo
	; GFX10-NEXT: s_mov_b32 s10, s12			; GFX10-NEXT: s_mov_b32 s10, s12
	; GFX10-NEXT: s_mov_b32 s11, s13			; GFX10-NEXT: s_mov_b32 s11, s13
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10-NEXT: image_gather4_c_cl_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_cl_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.o.2d.v4f32.f32(i32 1, i32 %offset, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.o.2d.v4f32.f32(i32 1, i32 %offset, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_b_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_b_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_b_o_2d:			; GFX10-LABEL: gather4_b_o_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s1, exec_lo
	; GFX10-NEXT: s_mov_b32 s0, s2			; GFX10-NEXT: s_mov_b32 s0, s2
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10-NEXT: s_mov_b32 s1, s3			; GFX10-NEXT: s_mov_b32 s1, s3
	; GFX10-NEXT: s_mov_b32 s2, s4			; GFX10-NEXT: s_mov_b32 s2, s4
	; GFX10-NEXT: s_mov_b32 s3, s5			; GFX10-NEXT: s_mov_b32 s3, s5
	; GFX10-NEXT: s_mov_b32 s4, s6			; GFX10-NEXT: s_mov_b32 s4, s6
	; GFX10-NEXT: s_mov_b32 s5, s7			; GFX10-NEXT: s_mov_b32 s5, s7
	; GFX10-NEXT: s_mov_b32 s6, s8			; GFX10-NEXT: s_mov_b32 s6, s8
	; GFX10-NEXT: s_mov_b32 s7, s9			; GFX10-NEXT: s_mov_b32 s7, s9
	; GFX10-NEXT: s_mov_b32 s8, s10			; GFX10-NEXT: s_mov_b32 s8, s10
	; GFX10-NEXT: s_mov_b32 s9, s11			; GFX10-NEXT: s_mov_b32 s9, s11
	; GFX10-NEXT: s_mov_b32 s14, exec_lo
	; GFX10-NEXT: s_mov_b32 s10, s12			; GFX10-NEXT: s_mov_b32 s10, s12
	; GFX10-NEXT: s_mov_b32 s11, s13			; GFX10-NEXT: s_mov_b32 s11, s13
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10-NEXT: image_gather4_b_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_b_o v[0:3], v[0:3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.o.2d.v4f32.f32.f32(i32 1, i32 %offset, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.o.2d.v4f32.f32.f32(i32 1, i32 %offset, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	Show All 16 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_c_b_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_c_b_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_b_o_2d:			; GFX10-LABEL: gather4_c_b_o_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s1, exec_lo
	; GFX10-NEXT: s_mov_b32 s0, s2			; GFX10-NEXT: s_mov_b32 s0, s2
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10-NEXT: s_mov_b32 s1, s3			; GFX10-NEXT: s_mov_b32 s1, s3
	; GFX10-NEXT: s_mov_b32 s2, s4			; GFX10-NEXT: s_mov_b32 s2, s4
	; GFX10-NEXT: s_mov_b32 s3, s5			; GFX10-NEXT: s_mov_b32 s3, s5
	; GFX10-NEXT: s_mov_b32 s4, s6			; GFX10-NEXT: s_mov_b32 s4, s6
	; GFX10-NEXT: s_mov_b32 s5, s7			; GFX10-NEXT: s_mov_b32 s5, s7
	; GFX10-NEXT: s_mov_b32 s6, s8			; GFX10-NEXT: s_mov_b32 s6, s8
	; GFX10-NEXT: s_mov_b32 s7, s9			; GFX10-NEXT: s_mov_b32 s7, s9
	; GFX10-NEXT: s_mov_b32 s8, s10			; GFX10-NEXT: s_mov_b32 s8, s10
	; GFX10-NEXT: s_mov_b32 s9, s11			; GFX10-NEXT: s_mov_b32 s9, s11
	; GFX10-NEXT: s_mov_b32 s14, exec_lo
	; GFX10-NEXT: s_mov_b32 s10, s12			; GFX10-NEXT: s_mov_b32 s10, s12
	; GFX10-NEXT: s_mov_b32 s11, s13			; GFX10-NEXT: s_mov_b32 s11, s13
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10-NEXT: image_gather4_c_b_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.o.2d.v4f32.f32.f32(i32 1, i32 %offset, float %bias, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.o.2d.v4f32.f32.f32(i32 1, i32 %offset, float %bias, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GFX6-NEXT: s_wqm_b64 exec, exec			; GFX6-NEXT: s_wqm_b64 exec, exec
	; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6-NEXT: image_gather4_c_b_cl_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1			; GFX6-NEXT: image_gather4_c_b_cl_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1
	; GFX6-NEXT: s_waitcnt vmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0)
	; GFX6-NEXT: ; return to shader part epilog			; GFX6-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_b_cl_o_2d:			; GFX10-LABEL: gather4_c_b_cl_o_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
				; GFX10-NEXT: s_mov_b32 s1, exec_lo
	; GFX10-NEXT: s_mov_b32 s0, s2			; GFX10-NEXT: s_mov_b32 s0, s2
				; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
				; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s1
	; GFX10-NEXT: s_mov_b32 s1, s3			; GFX10-NEXT: s_mov_b32 s1, s3
	; GFX10-NEXT: s_mov_b32 s2, s4			; GFX10-NEXT: s_mov_b32 s2, s4
	; GFX10-NEXT: s_mov_b32 s3, s5			; GFX10-NEXT: s_mov_b32 s3, s5
	; GFX10-NEXT: s_mov_b32 s4, s6			; GFX10-NEXT: s_mov_b32 s4, s6
	; GFX10-NEXT: s_mov_b32 s5, s7			; GFX10-NEXT: s_mov_b32 s5, s7
	; GFX10-NEXT: s_mov_b32 s6, s8			; GFX10-NEXT: s_mov_b32 s6, s8
	; GFX10-NEXT: s_mov_b32 s7, s9			; GFX10-NEXT: s_mov_b32 s7, s9
	; GFX10-NEXT: s_mov_b32 s8, s10			; GFX10-NEXT: s_mov_b32 s8, s10
	; GFX10-NEXT: s_mov_b32 s9, s11			; GFX10-NEXT: s_mov_b32 s9, s11
	; GFX10-NEXT: s_mov_b32 s14, exec_lo
	; GFX10-NEXT: s_mov_b32 s10, s12			; GFX10-NEXT: s_mov_b32 s10, s12
	; GFX10-NEXT: s_mov_b32 s11, s13			; GFX10-NEXT: s_mov_b32 s11, s13
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10-NEXT: image_gather4_c_b_cl_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D			; GFX10-NEXT: image_gather4_c_b_cl_o v[0:3], v[0:7], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.o.2d.v4f32.f32.f32(i32 1, i32 %offset, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.o.2d.v4f32.f32.f32(i32 1, i32 %offset, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX10 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32,GFX10 %s

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()
	declare i32 @llvm.amdgcn.raw.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.raw.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32 immarg)
	declare i32 @llvm.amdgcn.struct.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.struct.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32, i32 immarg)
	declare i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32 immarg)

	; Show what the atomic optimization pass will do for raw buffers.			; Show what the atomic optimization pass will do for raw buffers.

	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:2 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:4 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:8 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_add v[[value]]			; GFX8MORE: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @add_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: struct_add_i32_varying_vdata:			; GCN-LABEL: struct_add_i32_varying_vdata:
	; GFX7LESS-NOT: v_mbcnt_lo_u32_b32			; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
	; GFX7LESS-NOT: v_mbcnt_hi_u32_b32			; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
	; GFX7LESS-NOT: s_bcnt1_i32_b64			; GFX7LESS-NOT: s_bcnt1_i32_b64
	; GFX7LESS: buffer_atomic_add v{{[0-9]+}}			; GFX7LESS: buffer_atomic_add v{{[0-9]+}}
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:1 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:2 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:4 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:8 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_add v[[value]]			; GFX8MORE: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @struct_add_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %vindex) {			define amdgpu_kernel void @struct_add_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout, i32 %vindex) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 %lane, <4 x i32> %inout, i32 %vindex, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 %lane, <4 x i32> %inout, i32 %vindex, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:2 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:4 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:8 row_mask:0xf bank_mask:0xf			; GFX8MORE: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX89: v_mov_b32_dpp v{{[0-9]+}}, v{{[0-9]+}} wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_sub v[[value]]			; GFX8MORE: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @sub_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	Show All 13 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX10 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32,GFX10 %s

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()

	; Show what the atomic optimization pass will do for global pointers.			; Show what the atomic optimization pass will do for global pointers.

	; GCN-LABEL: add_i32_constant:			; GCN-LABEL: add_i32_constant:
	; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo			; GCN32: s_mov_b32 s[[exec_lo:[0-9]+]], exec_lo
	; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec			; GCN64: s_mov_b64 s{{\[}}[[exec_lo:[0-9]+]]:[[exec_hi:[0-9]+]]{{\]}}, exec
	Show All 35 Lines
	; GFX7LESS-NOT: v_mbcnt_lo_u32_b32			; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
	; GFX7LESS-NOT: v_mbcnt_hi_u32_b32			; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
	; GFX7LESS-NOT: s_bcnt1_i32_b64			; GFX7LESS-NOT: s_bcnt1_i32_b64
	; GFX7LESS: buffer_atomic_add v{{[0-9]+}}			; GFX7LESS: buffer_atomic_add v{{[0-9]+}}
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_add v[[value]]			; GFX8MORE: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_varying(i32 addrspace(1)* %out, i32 addrspace(1)* %inout) {			define amdgpu_kernel void @add_i32_varying(i32 addrspace(1)* %out, i32 addrspace(1)* %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw add i32 addrspace(1)* %inout, i32 %lane acq_rel			%old = atomicrmw add i32 addrspace(1)* %inout, i32 %lane acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; GFX7LESS-NOT: v_mbcnt_lo_u32_b32			; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
	; GFX7LESS-NOT: v_mbcnt_hi_u32_b32			; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
	; GFX7LESS-NOT: s_bcnt1_i32_b64			; GFX7LESS-NOT: s_bcnt1_i32_b64
	; GFX7LESS: buffer_atomic_sub v{{[0-9]+}}			; GFX7LESS: buffer_atomic_sub v{{[0-9]+}}
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_sub v[[value]]			; GFX8MORE: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_varying(i32 addrspace(1)* %out, i32 addrspace(1)* %inout) {			define amdgpu_kernel void @sub_i32_varying(i32 addrspace(1)* %out, i32 addrspace(1)* %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw sub i32 addrspace(1)* %inout, i32 %lane acq_rel			%old = atomicrmw sub i32 addrspace(1)* %inout, i32 %lane acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

	Show First 20 Lines • Show All 364 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_varying:			; GFX8-LABEL: add_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[2:3], exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB2_2			; GFX8-NEXT: s_cbranch_execz BB2_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_add_rtn_u32 v0, v0, v3			; GFX8-NEXT: ds_add_rtn_u32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB2_2:			; GFX8-NEXT: BB2_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, s2, v0			; GFX8-NEXT: v_add_u32_e32 v0, vcc, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_varying:			; GFX9-LABEL: add_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[2:3], exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB2_2			; GFX9-NEXT: s_cbranch_execz BB2_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_add_rtn_u32 v0, v0, v3			; GFX9-NEXT: ds_add_rtn_u32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB2_2:			; GFX9-NEXT: BB2_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_add_u32_e32 v0, s2, v0			; GFX9-NEXT: v_add_u32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_varying:			; GFX1064-LABEL: add_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v3, 0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB2_2			; GFX1064-NEXT: s_cbranch_execz BB2_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_add_rtn_u32 v0, v7, v4			; GFX1064-NEXT: ds_add_rtn_u32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB2_2:			; GFX1064-NEXT: BB2_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_add_nc_u32_e32 v0, s3, v0			; GFX1064-NEXT: v_add_nc_u32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: add_i32_varying:			; GFX1032-LABEL: add_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s3
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: v_mov_b32_e32 v3, 0
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB2_2			; GFX1032-NEXT: s_cbranch_execz BB2_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_add_rtn_u32 v0, v7, v4			; GFX1032-NEXT: ds_add_rtn_u32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB2_2:			; GFX1032-NEXT: BB2_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_add_nc_u32_e32 v0, s3, v0			; GFX1032-NEXT: v_add_nc_u32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	Show All 17 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_varying_gfx1032:			; GFX8-LABEL: add_i32_varying_gfx1032:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[2:3], exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB3_2			; GFX8-NEXT: s_cbranch_execz BB3_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_add_rtn_u32 v0, v0, v3			; GFX8-NEXT: ds_add_rtn_u32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB3_2:			; GFX8-NEXT: BB3_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, s2, v0			; GFX8-NEXT: v_add_u32_e32 v0, vcc, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_varying_gfx1032:			; GFX9-LABEL: add_i32_varying_gfx1032:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[2:3], exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB3_2			; GFX9-NEXT: s_cbranch_execz BB3_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_add_rtn_u32 v0, v0, v3			; GFX9-NEXT: ds_add_rtn_u32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB3_2:			; GFX9-NEXT: BB3_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_add_u32_e32 v0, s2, v0			; GFX9-NEXT: v_add_u32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_varying_gfx1032:			; GFX1064-LABEL: add_i32_varying_gfx1032:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v3, 0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB3_2			; GFX1064-NEXT: s_cbranch_execz BB3_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_add_rtn_u32 v0, v7, v4			; GFX1064-NEXT: ds_add_rtn_u32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB3_2:			; GFX1064-NEXT: BB3_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_add_nc_u32_e32 v0, s3, v0			; GFX1064-NEXT: v_add_nc_u32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: add_i32_varying_gfx1032:			; GFX1032-LABEL: add_i32_varying_gfx1032:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s3
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: v_mov_b32_e32 v3, 0
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB3_2			; GFX1032-NEXT: s_cbranch_execz BB3_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_add_rtn_u32 v0, v7, v4			; GFX1032-NEXT: ds_add_rtn_u32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB3_2:			; GFX1032-NEXT: BB3_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_add_nc_u32_e32 v0, s3, v0			; GFX1032-NEXT: v_add_nc_u32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	Show All 17 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_varying_gfx1064:			; GFX8-LABEL: add_i32_varying_gfx1064:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[2:3], exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB4_2			; GFX8-NEXT: s_cbranch_execz BB4_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_add_rtn_u32 v0, v0, v3			; GFX8-NEXT: ds_add_rtn_u32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB4_2:			; GFX8-NEXT: BB4_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, s2, v0			; GFX8-NEXT: v_add_u32_e32 v0, vcc, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_varying_gfx1064:			; GFX9-LABEL: add_i32_varying_gfx1064:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[2:3], exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB4_2			; GFX9-NEXT: s_cbranch_execz BB4_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_add_rtn_u32 v0, v0, v3			; GFX9-NEXT: ds_add_rtn_u32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB4_2:			; GFX9-NEXT: BB4_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_add_u32_e32 v0, s2, v0			; GFX9-NEXT: v_add_u32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_varying_gfx1064:			; GFX1064-LABEL: add_i32_varying_gfx1064:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v3, 0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB4_2			; GFX1064-NEXT: s_cbranch_execz BB4_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_add_rtn_u32 v0, v7, v4			; GFX1064-NEXT: ds_add_rtn_u32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB4_2:			; GFX1064-NEXT: BB4_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_add_nc_u32_e32 v0, s3, v0			; GFX1064-NEXT: v_add_nc_u32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: add_i32_varying_gfx1064:			; GFX1032-LABEL: add_i32_varying_gfx1064:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s3
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: v_mov_b32_e32 v3, 0
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB4_2			; GFX1032-NEXT: s_cbranch_execz BB4_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_add_rtn_u32 v0, v7, v4			; GFX1032-NEXT: ds_add_rtn_u32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB4_2:			; GFX1032-NEXT: BB4_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_add_nc_u32_e32 v0, s3, v0			; GFX1032-NEXT: v_add_nc_u32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	▲ Show 20 Lines • Show All 849 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: sub_i32_varying:			; GFX8-LABEL: sub_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[2:3], exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB10_2			; GFX8-NEXT: s_cbranch_execz BB10_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_sub_rtn_u32 v0, v0, v3			; GFX8-NEXT: ds_sub_rtn_u32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB10_2:			; GFX8-NEXT: BB10_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s2, v0			; GFX8-NEXT: v_sub_u32_e32 v0, vcc, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: sub_i32_varying:			; GFX9-LABEL: sub_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[2:3], exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB10_2			; GFX9-NEXT: s_cbranch_execz BB10_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_sub_rtn_u32 v0, v0, v3			; GFX9-NEXT: ds_sub_rtn_u32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB10_2:			; GFX9-NEXT: BB10_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_sub_u32_e32 v0, s2, v0			; GFX9-NEXT: v_sub_u32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: sub_i32_varying:			; GFX1064-LABEL: sub_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v3, 0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB10_2			; GFX1064-NEXT: s_cbranch_execz BB10_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_sub_rtn_u32 v0, v7, v4			; GFX1064-NEXT: ds_sub_rtn_u32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB10_2:			; GFX1064-NEXT: BB10_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_sub_nc_u32_e32 v0, s3, v0			; GFX1064-NEXT: v_sub_nc_u32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: sub_i32_varying:			; GFX1032-LABEL: sub_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s3
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: v_mov_b32_e32 v3, 0
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB10_2			; GFX1032-NEXT: s_cbranch_execz BB10_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_sub_rtn_u32 v0, v7, v4			; GFX1032-NEXT: ds_sub_rtn_u32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB10_2:			; GFX1032-NEXT: BB10_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_sub_nc_u32_e32 v0, s3, v0			; GFX1032-NEXT: v_sub_nc_u32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	▲ Show 20 Lines • Show All 516 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: and_i32_varying:			; GFX8-LABEL: and_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
				; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, -1			; GFX8-NEXT: v_mov_b32_e32 v1, -1
	; GFX8-NEXT: s_mov_b64 exec, s[2:3]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, -1			; GFX8-NEXT: v_mov_b32_e32 v2, -1
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_and_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB14_2			; GFX8-NEXT: s_cbranch_execz BB14_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_and_rtn_b32 v0, v0, v3			; GFX8-NEXT: ds_and_rtn_b32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB14_2:			; GFX8-NEXT: BB14_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_and_b32_e32 v0, s2, v0			; GFX8-NEXT: v_and_b32_e32 v0, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: and_i32_varying:			; GFX9-LABEL: and_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
				; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, -1			; GFX9-NEXT: v_mov_b32_e32 v1, -1
	; GFX9-NEXT: s_mov_b64 exec, s[2:3]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, -1			; GFX9-NEXT: v_mov_b32_e32 v2, -1
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_and_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB14_2			; GFX9-NEXT: s_cbranch_execz BB14_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_and_rtn_b32 v0, v0, v3			; GFX9-NEXT: ds_and_rtn_b32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB14_2:			; GFX9-NEXT: BB14_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_and_b32_e32 v0, s2, v0			; GFX9-NEXT: v_and_b32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: and_i32_varying:			; GFX1064-LABEL: and_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
				; GFX1064-NEXT: v_mov_b32_e32 v1, v0
				; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: v_mov_b32_e32 v1, -1
				; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: v_mov_b32_e32 v3, -1
				; GFX1064-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, exec_hi, v4
	; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, -1			; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1064-NEXT: s_mov_b64 exec, s[2:3]			; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
	; GFX1064-NEXT: v_mov_b32_e32 v2, -1			; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_and_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_and_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB14_2			; GFX1064-NEXT: s_cbranch_execz BB14_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_and_rtn_b32 v0, v7, v4			; GFX1064-NEXT: ds_and_rtn_b32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB14_2:			; GFX1064-NEXT: BB14_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_and_b32_e32 v0, s3, v0			; GFX1064-NEXT: v_and_b32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: and_i32_varying:			; GFX1032-LABEL: and_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
				; GFX1032-NEXT: v_mov_b32_e32 v1, v0
				; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
				; GFX1032-NEXT: v_mov_b32_e32 v1, -1
				; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: v_mov_b32_e32 v2, v1
				; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s2, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, -1			; GFX1032-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1032-NEXT: v_mov_b32_e32 v3, -1
				; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
				; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: s_mov_b32 exec_lo, s2			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
	; GFX1032-NEXT: v_mov_b32_e32 v2, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_and_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1032-NEXT: v_and_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v4
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB14_2			; GFX1032-NEXT: s_cbranch_execz BB14_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_and_rtn_b32 v0, v7, v4			; GFX1032-NEXT: ds_and_rtn_b32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB14_2:			; GFX1032-NEXT: BB14_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_and_b32_e32 v0, s3, v0			; GFX1032-NEXT: v_and_b32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	Show All 17 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: or_i32_varying:			; GFX8-LABEL: or_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[2:3], exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_or_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB15_2			; GFX8-NEXT: s_cbranch_execz BB15_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_or_rtn_b32 v0, v0, v3			; GFX8-NEXT: ds_or_rtn_b32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB15_2:			; GFX8-NEXT: BB15_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_or_b32_e32 v0, s2, v0			; GFX8-NEXT: v_or_b32_e32 v0, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: or_i32_varying:			; GFX9-LABEL: or_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[2:3], exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_or_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB15_2			; GFX9-NEXT: s_cbranch_execz BB15_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_or_rtn_b32 v0, v0, v3			; GFX9-NEXT: ds_or_rtn_b32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB15_2:			; GFX9-NEXT: BB15_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_or_b32_e32 v0, s2, v0			; GFX9-NEXT: v_or_b32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: or_i32_varying:			; GFX1064-LABEL: or_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v3, 0
				; GFX1064-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_or_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_or_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB15_2			; GFX1064-NEXT: s_cbranch_execz BB15_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_or_rtn_b32 v0, v7, v4			; GFX1064-NEXT: ds_or_rtn_b32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB15_2:			; GFX1064-NEXT: BB15_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_or_b32_e32 v0, s3, v0			; GFX1064-NEXT: v_or_b32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: or_i32_varying:			; GFX1032-LABEL: or_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s3
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_or_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_or_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: v_mov_b32_e32 v3, 0
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB15_2			; GFX1032-NEXT: s_cbranch_execz BB15_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_or_rtn_b32 v0, v7, v4			; GFX1032-NEXT: ds_or_rtn_b32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB15_2:			; GFX1032-NEXT: BB15_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_or_b32_e32 v0, s3, v0			; GFX1032-NEXT: v_or_b32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	Show All 17 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: xor_i32_varying:			; GFX8-LABEL: xor_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[2:3], exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_xor_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB16_2			; GFX8-NEXT: s_cbranch_execz BB16_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_xor_rtn_b32 v0, v0, v3			; GFX8-NEXT: ds_xor_rtn_b32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB16_2:			; GFX8-NEXT: BB16_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_xor_b32_e32 v0, s2, v0			; GFX8-NEXT: v_xor_b32_e32 v0, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: xor_i32_varying:			; GFX9-LABEL: xor_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[2:3], exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_xor_b32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB16_2			; GFX9-NEXT: s_cbranch_execz BB16_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_xor_rtn_b32 v0, v0, v3			; GFX9-NEXT: ds_xor_rtn_b32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB16_2:			; GFX9-NEXT: BB16_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_xor_b32_e32 v0, s2, v0			; GFX9-NEXT: v_xor_b32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: xor_i32_varying:			; GFX1064-LABEL: xor_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v3, 0
				; GFX1064-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_xor_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_xor_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB16_2			; GFX1064-NEXT: s_cbranch_execz BB16_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_xor_rtn_b32 v0, v7, v4			; GFX1064-NEXT: ds_xor_rtn_b32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB16_2:			; GFX1064-NEXT: BB16_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_xor_b32_e32 v0, s3, v0			; GFX1064-NEXT: v_xor_b32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: xor_i32_varying:			; GFX1032-LABEL: xor_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s3
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_xor_b32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_xor_b32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: v_mov_b32_e32 v3, 0
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB16_2			; GFX1032-NEXT: s_cbranch_execz BB16_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_xor_rtn_b32 v0, v7, v4			; GFX1032-NEXT: ds_xor_rtn_b32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB16_2:			; GFX1032-NEXT: BB16_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_xor_b32_e32 v0, s3, v0			; GFX1032-NEXT: v_xor_b32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	Show All 17 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: max_i32_varying:			; GFX8-LABEL: max_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
				; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_bfrev_b32_e32 v1, 1			; GFX8-NEXT: v_bfrev_b32_e32 v1, 1
	; GFX8-NEXT: s_mov_b64 exec, s[2:3]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v1			; GFX8-NEXT: v_mov_b32_e32 v2, v1
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_max_i32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB17_2			; GFX8-NEXT: s_cbranch_execz BB17_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_max_rtn_i32 v0, v0, v3			; GFX8-NEXT: ds_max_rtn_i32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB17_2:			; GFX8-NEXT: BB17_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_max_i32_e32 v0, s2, v0			; GFX8-NEXT: v_max_i32_e32 v0, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: max_i32_varying:			; GFX9-LABEL: max_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
				; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_bfrev_b32_e32 v1, 1			; GFX9-NEXT: v_bfrev_b32_e32 v1, 1
	; GFX9-NEXT: s_mov_b64 exec, s[2:3]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v1			; GFX9-NEXT: v_mov_b32_e32 v2, v1
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_max_i32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB17_2			; GFX9-NEXT: s_cbranch_execz BB17_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_max_rtn_i32 v0, v0, v3			; GFX9-NEXT: ds_max_rtn_i32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB17_2:			; GFX9-NEXT: BB17_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_max_i32_e32 v0, s2, v0			; GFX9-NEXT: v_max_i32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: max_i32_varying:			; GFX1064-LABEL: max_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0			; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, exec_hi, v4
	; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1064-NEXT: v_bfrev_b32_e32 v1, 1			; GFX1064-NEXT: v_bfrev_b32_e32 v1, 1
	; GFX1064-NEXT: s_mov_b64 exec, s[2:3]			; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v1			; GFX1064-NEXT: v_mov_b32_e32 v2, v1
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2			; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_max_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1064-NEXT: v_max_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31			; GFX1064-NEXT: v_readlane_b32 s4, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2			; GFX1064-NEXT: v_mov_b32_e32 v3, s4
	; GFX1064-NEXT: v_max_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1064-NEXT: v_max_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v2, 15
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15			; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31			; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v2, 31
				; GFX1064-NEXT: v_writelane_b32 v1, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v2, 63
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47			; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16			; GFX1064-NEXT: v_writelane_b32 v1, s5, 32
	; GFX1064-NEXT: s_mov_b32 s2, -1			; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32			; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48			; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB17_2			; GFX1064-NEXT: s_cbranch_execz BB17_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_max_rtn_i32 v0, v7, v4			; GFX1064-NEXT: ds_max_rtn_i32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB17_2:			; GFX1064-NEXT: BB17_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v1
	; GFX1064-NEXT: v_max_i32_e32 v0, s3, v0			; GFX1064-NEXT: v_max_i32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: max_i32_varying:			; GFX1032-LABEL: max_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0			; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s2, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_bfrev_b32_e32 v1, 1			; GFX1032-NEXT: v_bfrev_b32_e32 v1, 1
	; GFX1032-NEXT: s_mov_b32 exec_lo, s2			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, v1			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_max_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_mov_b32_e32 v3, v2
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_max_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: v_max_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: v_readlane_b32 s3, v2, 15
				; GFX1032-NEXT: v_readlane_b32 s4, v2, 31
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v4			; GFX1032-NEXT: v_writelane_b32 v1, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB17_2			; GFX1032-NEXT: s_cbranch_execz BB17_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_max_rtn_i32 v0, v7, v4			; GFX1032-NEXT: ds_max_rtn_i32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB17_2:			; GFX1032-NEXT: BB17_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v1
	; GFX1032-NEXT: v_max_i32_e32 v0, s3, v0			; GFX1032-NEXT: v_max_i32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: min_i32_varying:			; GFX8-LABEL: min_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
				; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_bfrev_b32_e32 v1, -2			; GFX8-NEXT: v_bfrev_b32_e32 v1, -2
	; GFX8-NEXT: s_mov_b64 exec, s[2:3]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v1			; GFX8-NEXT: v_mov_b32_e32 v2, v1
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_min_i32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB19_2			; GFX8-NEXT: s_cbranch_execz BB19_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_min_rtn_i32 v0, v0, v3			; GFX8-NEXT: ds_min_rtn_i32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB19_2:			; GFX8-NEXT: BB19_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_min_i32_e32 v0, s2, v0			; GFX8-NEXT: v_min_i32_e32 v0, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: min_i32_varying:			; GFX9-LABEL: min_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
				; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_bfrev_b32_e32 v1, -2			; GFX9-NEXT: v_bfrev_b32_e32 v1, -2
	; GFX9-NEXT: s_mov_b64 exec, s[2:3]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v1			; GFX9-NEXT: v_mov_b32_e32 v2, v1
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_min_i32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB19_2			; GFX9-NEXT: s_cbranch_execz BB19_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_min_rtn_i32 v0, v0, v3			; GFX9-NEXT: ds_min_rtn_i32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB19_2:			; GFX9-NEXT: BB19_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_min_i32_e32 v0, s2, v0			; GFX9-NEXT: v_min_i32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: min_i32_varying:			; GFX1064-LABEL: min_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0			; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, exec_hi, v4
	; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1064-NEXT: v_bfrev_b32_e32 v1, -2			; GFX1064-NEXT: v_bfrev_b32_e32 v1, -2
	; GFX1064-NEXT: s_mov_b64 exec, s[2:3]			; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v1			; GFX1064-NEXT: v_mov_b32_e32 v2, v1
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2			; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_min_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1064-NEXT: v_min_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31			; GFX1064-NEXT: v_readlane_b32 s4, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2			; GFX1064-NEXT: v_mov_b32_e32 v3, s4
	; GFX1064-NEXT: v_min_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1064-NEXT: v_min_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v2, 15
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15			; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31			; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v2, 31
				; GFX1064-NEXT: v_writelane_b32 v1, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v2, 63
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47			; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16			; GFX1064-NEXT: v_writelane_b32 v1, s5, 32
	; GFX1064-NEXT: s_mov_b32 s2, -1			; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32			; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48			; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB19_2			; GFX1064-NEXT: s_cbranch_execz BB19_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_min_rtn_i32 v0, v7, v4			; GFX1064-NEXT: ds_min_rtn_i32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB19_2:			; GFX1064-NEXT: BB19_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v1
	; GFX1064-NEXT: v_min_i32_e32 v0, s3, v0			; GFX1064-NEXT: v_min_i32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: min_i32_varying:			; GFX1032-LABEL: min_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0			; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s2, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_bfrev_b32_e32 v1, -2			; GFX1032-NEXT: v_bfrev_b32_e32 v1, -2
	; GFX1032-NEXT: s_mov_b32 exec_lo, s2			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, v1			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_min_i32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_mov_b32_e32 v3, v2
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_min_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: v_min_i32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: v_readlane_b32 s3, v2, 15
				; GFX1032-NEXT: v_readlane_b32 s4, v2, 31
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v4			; GFX1032-NEXT: v_writelane_b32 v1, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB19_2			; GFX1032-NEXT: s_cbranch_execz BB19_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_min_rtn_i32 v0, v7, v4			; GFX1032-NEXT: ds_min_rtn_i32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB19_2:			; GFX1032-NEXT: BB19_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v1
	; GFX1032-NEXT: v_min_i32_e32 v0, s3, v0			; GFX1032-NEXT: v_min_i32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: umax_i32_varying:			; GFX8-LABEL: umax_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: s_mov_b64 s[2:3], exec
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_max_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB21_2			; GFX8-NEXT: s_cbranch_execz BB21_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_max_rtn_u32 v0, v0, v3			; GFX8-NEXT: ds_max_rtn_u32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB21_2:			; GFX8-NEXT: BB21_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_max_u32_e32 v0, s2, v0			; GFX8-NEXT: v_max_u32_e32 v0, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: umax_i32_varying:			; GFX9-LABEL: umax_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: s_mov_b64 s[2:3], exec
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_max_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB21_2			; GFX9-NEXT: s_cbranch_execz BB21_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_max_rtn_u32 v0, v0, v3			; GFX9-NEXT: ds_max_rtn_u32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB21_2:			; GFX9-NEXT: BB21_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_max_u32_e32 v0, s2, v0			; GFX9-NEXT: v_max_u32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: umax_i32_varying:			; GFX1064-LABEL: umax_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v3, 0
				; GFX1064-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_max_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_max_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB21_2			; GFX1064-NEXT: s_cbranch_execz BB21_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_max_rtn_u32 v0, v7, v4			; GFX1064-NEXT: ds_max_rtn_u32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB21_2:			; GFX1064-NEXT: BB21_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_max_u32_e32 v0, s3, v0			; GFX1064-NEXT: v_max_u32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: umax_i32_varying:			; GFX1032-LABEL: umax_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s3, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s3
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_max_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_max_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15			; GFX1032-NEXT: v_mov_b32_e32 v3, 0
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16			; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4			; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
				; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
				; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB21_2			; GFX1032-NEXT: s_cbranch_execz BB21_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_max_rtn_u32 v0, v7, v4			; GFX1032-NEXT: ds_max_rtn_u32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB21_2:			; GFX1032-NEXT: BB21_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_max_u32_e32 v0, s3, v0			; GFX1032-NEXT: v_max_u32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines
	; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000			; GFX7LESS-NEXT: s_mov_b32 s3, 0xf000
	; GFX7LESS-NEXT: s_mov_b32 s2, -1			; GFX7LESS-NEXT: s_mov_b32 s2, -1
	; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7LESS-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7LESS-NEXT: s_endpgm			; GFX7LESS-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: umin_i32_varying:			; GFX8-LABEL: umin_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
				; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, -1			; GFX8-NEXT: v_mov_b32_e32 v1, -1
	; GFX8-NEXT: s_mov_b64 exec, s[2:3]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, -1			; GFX8-NEXT: v_mov_b32_e32 v2, -1
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX8-NEXT: v_min_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX8-NEXT: v_readlane_b32 s2, v2, 63			; GFX8-NEXT: v_readlane_b32 s4, v2, 63
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX8-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[2:3]
	; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3			; GFX8-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX8-NEXT: ; implicit-def: $vgpr0			; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX8-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX8-NEXT: s_cbranch_execz BB23_2			; GFX8-NEXT: s_cbranch_execz BB23_2
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX8-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX8-NEXT: v_mov_b32_e32 v3, s2			; GFX8-NEXT: v_mov_b32_e32 v3, s4
	; GFX8-NEXT: s_mov_b32 m0, -1			; GFX8-NEXT: s_mov_b32 m0, -1
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: ds_min_rtn_u32 v0, v0, v3			; GFX8-NEXT: ds_min_rtn_u32 v0, v0, v3
	; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX8-NEXT: buffer_wbinvl1_vol			; GFX8-NEXT: buffer_wbinvl1_vol
	; GFX8-NEXT: BB23_2:			; GFX8-NEXT: BB23_2:
	; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX8-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX8-NEXT: v_readfirstlane_b32 s2, v0			; GFX8-NEXT: v_readfirstlane_b32 s2, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_min_u32_e32 v0, s2, v0			; GFX8-NEXT: v_min_u32_e32 v0, s2, v0
	; GFX8-NEXT: s_mov_b32 s3, 0xf000			; GFX8-NEXT: s_mov_b32 s3, 0xf000
	; GFX8-NEXT: s_mov_b32 s2, -1			; GFX8-NEXT: s_mov_b32 s2, -1
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: umin_i32_varying:			; GFX9-LABEL: umin_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v3
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
				; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, -1			; GFX9-NEXT: v_mov_b32_e32 v1, -1
	; GFX9-NEXT: s_mov_b64 exec, s[2:3]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, -1			; GFX9-NEXT: v_mov_b32_e32 v2, -1
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf			; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_bcast:15 row_mask:0xa bank_mask:0xf
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-NEXT: v_min_u32_dpp v2, v2, v2 row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9-NEXT: v_readlane_b32 s2, v2, 63			; GFX9-NEXT: v_readlane_b32 s4, v2, 63
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf			; GFX9-NEXT: v_mov_b32_dpp v1, v2 wave_shr:1 row_mask:0xf bank_mask:0xf
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[2:3]
	; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3			; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX9-NEXT: ; implicit-def: $vgpr0			; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX9-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX9-NEXT: s_cbranch_execz BB23_2			; GFX9-NEXT: s_cbranch_execz BB23_2
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo			; GFX9-NEXT: v_mov_b32_e32 v0, local_var32@abs32@lo
	; GFX9-NEXT: v_mov_b32_e32 v3, s2			; GFX9-NEXT: v_mov_b32_e32 v3, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: ds_min_rtn_u32 v0, v0, v3			; GFX9-NEXT: ds_min_rtn_u32 v0, v0, v3
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_wbinvl1_vol			; GFX9-NEXT: buffer_wbinvl1_vol
	; GFX9-NEXT: BB23_2:			; GFX9-NEXT: BB23_2:
	; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX9-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX9-NEXT: v_readfirstlane_b32 s2, v0			; GFX9-NEXT: v_readfirstlane_b32 s2, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_min_u32_e32 v0, s2, v0			; GFX9-NEXT: v_min_u32_e32 v0, s2, v0
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: umin_i32_varying:			; GFX1064-LABEL: umin_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
				; GFX1064-NEXT: v_mov_b32_e32 v1, v0
				; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: v_mov_b32_e32 v1, -1
				; GFX1064-NEXT: s_not_b64 exec, exec
				; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1064-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: v_mov_b32_e32 v3, -1
				; GFX1064-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: v_mov_b32_e32 v2, v1
				; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1064-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1064-NEXT: v_mov_b32_e32 v2, s4
				; GFX1064-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1064-NEXT: v_readlane_b32 s4, v1, 15
				; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v4, exec_hi, v4
	; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, -1			; GFX1064-NEXT: v_readlane_b32 s5, v1, 31
				; GFX1064-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1064-NEXT: s_mov_b64 exec, s[2:3]			; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
	; GFX1064-NEXT: v_mov_b32_e32 v2, -1			; GFX1064-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: v_readlane_b32 s7, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s6, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s5, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
	; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1064-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1064-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1064-NEXT: v_min_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 31
	; GFX1064-NEXT: v_mov_b32_e32 v3, s2
	; GFX1064-NEXT: v_min_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s2, v2, 15
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1064-NEXT: v_readlane_b32 s6, v2, 47
	; GFX1064-NEXT: v_writelane_b32 v1, s2, 16
	; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: v_writelane_b32 v1, s3, 32
	; GFX1064-NEXT: v_readlane_b32 s3, v2, 63
	; GFX1064-NEXT: v_writelane_b32 v1, s6, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[4:5]			; GFX1064-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v4			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; GFX1064-NEXT: s_mov_b32 s2, -1
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1064-NEXT: s_cbranch_execz BB23_2			; GFX1064-NEXT: s_cbranch_execz BB23_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1064-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1064-NEXT: v_mov_b32_e32 v4, s3			; GFX1064-NEXT: v_mov_b32_e32 v4, s7
				; GFX1064-NEXT: s_mov_b32 s3, s7
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1064-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1064-NEXT: ds_min_rtn_u32 v0, v7, v4			; GFX1064-NEXT: ds_min_rtn_u32 v0, v7, v4
	; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: buffer_gl0_inv			; GFX1064-NEXT: buffer_gl0_inv
	; GFX1064-NEXT: buffer_gl1_inv			; GFX1064-NEXT: buffer_gl1_inv
	; GFX1064-NEXT: BB23_2:			; GFX1064-NEXT: BB23_2:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1064-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1064-NEXT: v_readfirstlane_b32 s3, v0			; GFX1064-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_min_u32_e32 v0, s3, v0			; GFX1064-NEXT: v_min_u32_e32 v0, s3, v0
	; GFX1064-NEXT: s_mov_b32 s3, 0x31016000			; GFX1064-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1064-NEXT: s_waitcnt lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1064-NEXT: s_nop 0			; GFX1064-NEXT: s_nop 0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: umin_i32_varying:			; GFX1032-LABEL: umin_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
				; GFX1032-NEXT: v_mov_b32_e32 v1, v0
				; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
				; GFX1032-NEXT: v_mov_b32_e32 v1, -1
				; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
				; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1032-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1032-NEXT: v_mov_b32_e32 v2, v1
				; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX1032-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v4, exec_lo, 0
	; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_or_saveexec_b32 s2, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, -1			; GFX1032-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
				; GFX1032-NEXT: v_mov_b32_e32 v3, -1
				; GFX1032-NEXT: v_readlane_b32 s3, v1, 15
				; GFX1032-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: s_mov_b32 exec_lo, s2			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
	; GFX1032-NEXT: v_mov_b32_e32 v2, -1			; GFX1032-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1032-NEXT: s_or_saveexec_b32 s4, -1			; GFX1032-NEXT: s_mov_b32 exec_lo, s2
	; GFX1032-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1032-NEXT: s_mov_b32 s2, -1			; GFX1032-NEXT: s_mov_b32 s2, -1
	; GFX1032-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_min_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1
	; GFX1032-NEXT: v_min_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s3, v2, 31
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_readlane_b32 s5, v2, 15
	; GFX1032-NEXT: v_writelane_b32 v1, s5, 16
	; GFX1032-NEXT: s_mov_b32 exec_lo, s4
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v4
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s4, vcc_lo			; GFX1032-NEXT: ; implicit-def: $vcc_hi
				; GFX1032-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB23_2			; GFX1032-NEXT: s_cbranch_execz BB23_2
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo			; GFX1032-NEXT: v_mov_b32_e32 v7, local_var32@abs32@lo
	; GFX1032-NEXT: v_mov_b32_e32 v4, s3			; GFX1032-NEXT: v_mov_b32_e32 v4, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1032-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1032-NEXT: ds_min_rtn_u32 v0, v7, v4			; GFX1032-NEXT: ds_min_rtn_u32 v0, v7, v4
	; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1032-NEXT: buffer_gl0_inv			; GFX1032-NEXT: buffer_gl0_inv
	; GFX1032-NEXT: buffer_gl1_inv			; GFX1032-NEXT: buffer_gl1_inv
	; GFX1032-NEXT: BB23_2:			; GFX1032-NEXT: BB23_2:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1032-NEXT: v_readfirstlane_b32 s3, v0			; GFX1032-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_min_u32_e32 v0, s3, v0			; GFX1032-NEXT: v_min_u32_e32 v0, s3, v0
	; GFX1032-NEXT: s_mov_b32 s3, 0x31016000			; GFX1032-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: s_nop 0			; GFX1032-NEXT: s_nop 0
	; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

	Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX7-NEXT: BB1_2: ; %else			; GFX7-NEXT: BB1_2: ; %else
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: add_i32_varying:			; GFX8-LABEL: add_i32_varying:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_mov_b64 s[10:11], exec			; GFX8-NEXT: s_mov_b64 s[10:11], exec
	; GFX8-NEXT: ; implicit-def: $vgpr3
	; GFX8-NEXT: v_mov_b32_e32 v2, v0			; GFX8-NEXT: v_mov_b32_e32 v2, v0
				; GFX8-NEXT: ; implicit-def: $vgpr0
	; GFX8-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX8-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX8-NEXT: s_cbranch_execz BB1_4			; GFX8-NEXT: s_cbranch_execz BB1_4
	; GFX8-NEXT: ; %bb.1:			; GFX8-NEXT: ; %bb.1:
	; GFX8-NEXT: s_mov_b64 s[10:11], exec			; GFX8-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX8-NEXT: s_or_saveexec_b64 s[12:13], -1
	; GFX8-NEXT: v_mov_b32_e32 v1, 0			; GFX8-NEXT: v_mov_b32_e32 v1, 0
	; GFX8-NEXT: s_mov_b64 exec, s[12:13]			; GFX8-NEXT: s_mov_b64 exec, s[10:11]
	; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, s10, 0			; GFX8-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, s11, v0			; GFX8-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: v_mov_b32_e32 v2, 0			; GFX8-NEXT: v_mov_b32_e32 v2, 0
	; GFX8-NEXT: s_not_b64 exec, exec			; GFX8-NEXT: s_not_b64 exec, exec
	; GFX8-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX8-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX8-NEXT: v_add_u32_dpp v2, vcc, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX8-NEXT: s_nop 1			; GFX8-NEXT: s_nop 1
	Show All 15 Lines
	; GFX8-NEXT: ; %bb.2:			; GFX8-NEXT: ; %bb.2:
	; GFX8-NEXT: v_mov_b32_e32 v0, s12			; GFX8-NEXT: v_mov_b32_e32 v0, s12
	; GFX8-NEXT: buffer_atomic_add v0, off, s[4:7], 0 glc			; GFX8-NEXT: buffer_atomic_add v0, off, s[4:7], 0 glc
	; GFX8-NEXT: BB1_3:			; GFX8-NEXT: BB1_3:
	; GFX8-NEXT: s_or_b64 exec, exec, s[10:11]			; GFX8-NEXT: s_or_b64 exec, exec, s[10:11]
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readfirstlane_b32 s4, v0			; GFX8-NEXT: v_readfirstlane_b32 s4, v0
	; GFX8-NEXT: v_mov_b32_e32 v0, v1			; GFX8-NEXT: v_mov_b32_e32 v0, v1
	; GFX8-NEXT: v_add_u32_e32 v3, vcc, s4, v0			; GFX8-NEXT: v_add_u32_e32 v0, vcc, s4, v0
	; GFX8-NEXT: BB1_4: ; %Flow			; GFX8-NEXT: BB1_4: ; %Flow
	; GFX8-NEXT: s_or_b64 exec, exec, s[8:9]			; GFX8-NEXT: s_or_b64 exec, exec, s[8:9]
	; GFX8-NEXT: s_wqm_b64 s[4:5], -1			; GFX8-NEXT: s_wqm_b64 s[4:5], -1
	; GFX8-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GFX8-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GFX8-NEXT: s_cbranch_vccnz BB1_6			; GFX8-NEXT: s_cbranch_vccnz BB1_6
	; GFX8-NEXT: ; %bb.5: ; %if			; GFX8-NEXT: ; %bb.5: ; %if
	; GFX8-NEXT: buffer_store_dword v3, off, s[0:3], 0			; GFX8-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX8-NEXT: BB1_6: ; %UnifiedReturnBlock			; GFX8-NEXT: BB1_6: ; %UnifiedReturnBlock
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: add_i32_varying:			; GFX9-LABEL: add_i32_varying:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_mov_b64 s[10:11], exec			; GFX9-NEXT: s_mov_b64 s[10:11], exec
	; GFX9-NEXT: ; implicit-def: $vgpr3
	; GFX9-NEXT: v_mov_b32_e32 v2, v0			; GFX9-NEXT: v_mov_b32_e32 v2, v0
				; GFX9-NEXT: ; implicit-def: $vgpr0
	; GFX9-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX9-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX9-NEXT: s_cbranch_execz BB1_4			; GFX9-NEXT: s_cbranch_execz BB1_4
	; GFX9-NEXT: ; %bb.1:			; GFX9-NEXT: ; %bb.1:
	; GFX9-NEXT: s_mov_b64 s[10:11], exec			; GFX9-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX9-NEXT: s_or_saveexec_b64 s[12:13], -1
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_mov_b64 exec, s[12:13]			; GFX9-NEXT: s_mov_b64 exec, s[10:11]
	; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, s10, 0			; GFX9-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, s11, v0			; GFX9-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX9-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX9-NEXT: v_add_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX9-NEXT: s_nop 1			; GFX9-NEXT: s_nop 1
	Show All 15 Lines
	; GFX9-NEXT: ; %bb.2:			; GFX9-NEXT: ; %bb.2:
	; GFX9-NEXT: v_mov_b32_e32 v0, s12			; GFX9-NEXT: v_mov_b32_e32 v0, s12
	; GFX9-NEXT: buffer_atomic_add v0, off, s[4:7], 0 glc			; GFX9-NEXT: buffer_atomic_add v0, off, s[4:7], 0 glc
	; GFX9-NEXT: BB1_3:			; GFX9-NEXT: BB1_3:
	; GFX9-NEXT: s_or_b64 exec, exec, s[10:11]			; GFX9-NEXT: s_or_b64 exec, exec, s[10:11]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readfirstlane_b32 s4, v0			; GFX9-NEXT: v_readfirstlane_b32 s4, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, v1			; GFX9-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-NEXT: v_add_u32_e32 v3, s4, v0			; GFX9-NEXT: v_add_u32_e32 v0, s4, v0
	; GFX9-NEXT: BB1_4: ; %Flow			; GFX9-NEXT: BB1_4: ; %Flow
	; GFX9-NEXT: s_or_b64 exec, exec, s[8:9]			; GFX9-NEXT: s_or_b64 exec, exec, s[8:9]
	; GFX9-NEXT: s_wqm_b64 s[4:5], -1			; GFX9-NEXT: s_wqm_b64 s[4:5], -1
	; GFX9-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GFX9-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GFX9-NEXT: s_cbranch_vccnz BB1_6			; GFX9-NEXT: s_cbranch_vccnz BB1_6
	; GFX9-NEXT: ; %bb.5: ; %if			; GFX9-NEXT: ; %bb.5: ; %if
	; GFX9-NEXT: buffer_store_dword v3, off, s[0:3], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX9-NEXT: BB1_6: ; %UnifiedReturnBlock			; GFX9-NEXT: BB1_6: ; %UnifiedReturnBlock
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_varying:			; GFX1064-LABEL: add_i32_varying:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_mov_b64 s[10:11], exec			; GFX1064-NEXT: s_mov_b64 s[10:11], exec
	; GFX1064-NEXT: ; implicit-def: $vgpr4			; GFX1064-NEXT: v_mov_b32_e32 v1, v0
	; GFX1064-NEXT: v_mov_b32_e32 v2, v0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]			; GFX1064-NEXT: s_and_saveexec_b64 s[8:9], s[10:11]
	; GFX1064-NEXT: s_cbranch_execz BB1_4			; GFX1064-NEXT: s_cbranch_execz BB1_4
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: s_mov_b64 s[10:11], exec
	; GFX1064-NEXT: s_or_saveexec_b64 s[12:13], -1
	; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_mov_b64 exec, s[12:13]
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s10, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s11, v0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: v_mov_b32_e32 v2, 0			; GFX1064-NEXT: v_mov_b32_e32 v1, 0
	; GFX1064-NEXT: s_not_b64 exec, exec			; GFX1064-NEXT: s_not_b64 exec, exec
	; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_mov_b32_e32 v3, 0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_mov_b32_e32 v3, v2			; GFX1064-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1064-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1064-NEXT: v_mov_b32_e32 v2, v1
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1064-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1064-NEXT: v_readlane_b32 s12, v2, 31			; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1064-NEXT: v_mov_b32_e32 v3, s12			; GFX1064-NEXT: v_readlane_b32 s12, v1, 31
	; GFX1064-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1064-NEXT: v_mov_b32_e32 v2, s12
	; GFX1064-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1064-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
	; GFX1064-NEXT: v_readlane_b32 s12, v2, 15			; GFX1064-NEXT: v_readlane_b32 s12, v1, 15
	; GFX1064-NEXT: v_readlane_b32 s13, v2, 31			; GFX1064-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1064-NEXT: v_writelane_b32 v1, s12, 16			; GFX1064-NEXT: v_readlane_b32 s13, v1, 31
	; GFX1064-NEXT: v_readlane_b32 s12, v2, 63			; GFX1064-NEXT: v_writelane_b32 v3, s12, 16
	; GFX1064-NEXT: v_writelane_b32 v1, s13, 32			; GFX1064-NEXT: s_mov_b64 exec, s[10:11]
	; GFX1064-NEXT: v_readlane_b32 s13, v2, 47			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
	; GFX1064-NEXT: v_writelane_b32 v1, s13, 48			; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1
				; GFX1064-NEXT: v_readlane_b32 s12, v1, 63
				; GFX1064-NEXT: v_readlane_b32 s14, v1, 47
				; GFX1064-NEXT: v_writelane_b32 v3, s13, 32
				; GFX1064-NEXT: s_mov_b64 exec, s[10:11]
				; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, exec_hi, v0
				; GFX1064-NEXT: s_or_saveexec_b64 s[10:11], -1
				; GFX1064-NEXT: v_writelane_b32 v3, s14, 48
	; GFX1064-NEXT: s_mov_b64 exec, s[10:11]			; GFX1064-NEXT: s_mov_b64 exec, s[10:11]
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1064-NEXT: ; implicit-def: $vgpr0			; GFX1064-NEXT: ; implicit-def: $vgpr0
	; GFX1064-NEXT: s_and_saveexec_b64 s[30:31], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[30:31], vcc
	; GFX1064-NEXT: s_cbranch_execz BB1_3			; GFX1064-NEXT: s_cbranch_execz BB1_3
	; GFX1064-NEXT: ; %bb.2:			; GFX1064-NEXT: ; %bb.2:
	; GFX1064-NEXT: v_mov_b32_e32 v0, s12			; GFX1064-NEXT: v_mov_b32_e32 v0, s12
	; GFX1064-NEXT: buffer_atomic_add v0, off, s[4:7], 0 glc			; GFX1064-NEXT: buffer_atomic_add v0, off, s[4:7], 0 glc
	; GFX1064-NEXT: BB1_3:			; GFX1064-NEXT: BB1_3:
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_or_b64 exec, exec, s[30:31]			; GFX1064-NEXT: s_or_b64 exec, exec, s[30:31]
	; GFX1064-NEXT: s_waitcnt vmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0)
	; GFX1064-NEXT: v_readfirstlane_b32 s4, v0			; GFX1064-NEXT: v_readfirstlane_b32 s4, v0
	; GFX1064-NEXT: v_mov_b32_e32 v0, v1			; GFX1064-NEXT: v_mov_b32_e32 v0, v3
	; GFX1064-NEXT: v_add_nc_u32_e32 v4, s4, v0			; GFX1064-NEXT: v_add_nc_u32_e32 v0, s4, v0
	; GFX1064-NEXT: BB1_4: ; %Flow			; GFX1064-NEXT: BB1_4: ; %Flow
	; GFX1064-NEXT: s_or_b64 exec, exec, s[8:9]			; GFX1064-NEXT: s_or_b64 exec, exec, s[8:9]
	; GFX1064-NEXT: s_wqm_b64 s[4:5], -1			; GFX1064-NEXT: s_wqm_b64 s[4:5], -1
	; GFX1064-NEXT: s_andn2_b64 vcc, exec, s[4:5]			; GFX1064-NEXT: s_andn2_b64 vcc, exec, s[4:5]
	; GFX1064-NEXT: s_cbranch_vccnz BB1_6			; GFX1064-NEXT: s_cbranch_vccnz BB1_6
	; GFX1064-NEXT: ; %bb.5: ; %if			; GFX1064-NEXT: ; %bb.5: ; %if
	; GFX1064-NEXT: buffer_store_dword v4, off, s[0:3], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1064-NEXT: BB1_6: ; %UnifiedReturnBlock			; GFX1064-NEXT: BB1_6: ; %UnifiedReturnBlock
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: add_i32_varying:			; GFX1032-LABEL: add_i32_varying:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
	; GFX1032-NEXT: s_mov_b32 s9, exec_lo			; GFX1032-NEXT: s_mov_b32 s9, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vgpr4			; GFX1032-NEXT: v_mov_b32_e32 v1, v0
				; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: ; implicit-def: $vcc_hi			; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: v_mov_b32_e32 v2, v0
	; GFX1032-NEXT: s_and_saveexec_b32 s8, s9			; GFX1032-NEXT: s_and_saveexec_b32 s8, s9
	; GFX1032-NEXT: s_cbranch_execz BB1_4			; GFX1032-NEXT: s_cbranch_execz BB1_4
	; GFX1032-NEXT: ; %bb.1:			; GFX1032-NEXT: ; %bb.1:
	; GFX1032-NEXT: s_mov_b32 s9, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s10, -1
	; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_mov_b32 exec_lo, s10
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s9, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: v_mov_b32_e32 v2, 0			; GFX1032-NEXT: v_mov_b32_e32 v1, 0
	; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1032-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1032-NEXT: s_or_saveexec_b32 s9, -1			; GFX1032-NEXT: s_or_saveexec_b32 s9, -1
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_mov_b32_e32 v3, 0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v2, v2 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_mov_b32_e32 v3, v2			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:0
	; GFX1032-NEXT: v_permlanex16_b32 v3, v3, -1, -1			; GFX1032-NEXT: v_mov_b32_e32 v2, v1
	; GFX1032-NEXT: v_add_nc_u32_dpp v2, v3, v2 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1032-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1032-NEXT: v_readlane_b32 s10, v2, 31			; GFX1032-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1032-NEXT: v_mov_b32_dpp v1, v2 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1032-NEXT: v_readlane_b32 s11, v1, 31
	; GFX1032-NEXT: v_readlane_b32 s11, v2, 15			; GFX1032-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1032-NEXT: v_writelane_b32 v1, s11, 16			; GFX1032-NEXT: v_readlane_b32 s10, v1, 15
				; GFX1032-NEXT: s_mov_b32 exec_lo, s9
				; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, exec_lo, 0
				; GFX1032-NEXT: s_or_saveexec_b32 s9, -1
				; GFX1032-NEXT: v_writelane_b32 v3, s10, 16
	; GFX1032-NEXT: s_mov_b32 exec_lo, s9			; GFX1032-NEXT: s_mov_b32 exec_lo, s9
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1032-NEXT: ; implicit-def: $vgpr0			; GFX1032-NEXT: ; implicit-def: $vgpr0
	; GFX1032-NEXT: s_and_saveexec_b32 s9, vcc_lo			; GFX1032-NEXT: s_and_saveexec_b32 s9, vcc_lo
	; GFX1032-NEXT: s_cbranch_execz BB1_3			; GFX1032-NEXT: s_cbranch_execz BB1_3
	; GFX1032-NEXT: ; %bb.2:			; GFX1032-NEXT: ; %bb.2:
	; GFX1032-NEXT: v_mov_b32_e32 v0, s10			; GFX1032-NEXT: v_mov_b32_e32 v0, s11
				; GFX1032-NEXT: s_mov_b32 s10, s11
	; GFX1032-NEXT: buffer_atomic_add v0, off, s[4:7], 0 glc			; GFX1032-NEXT: buffer_atomic_add v0, off, s[4:7], 0 glc
	; GFX1032-NEXT: BB1_3:			; GFX1032-NEXT: BB1_3:
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s9			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s9
	; GFX1032-NEXT: s_waitcnt vmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0)
	; GFX1032-NEXT: v_readfirstlane_b32 s4, v0			; GFX1032-NEXT: v_readfirstlane_b32 s4, v0
	; GFX1032-NEXT: v_mov_b32_e32 v0, v1			; GFX1032-NEXT: v_mov_b32_e32 v0, v3
	; GFX1032-NEXT: v_add_nc_u32_e32 v4, s4, v0			; GFX1032-NEXT: v_add_nc_u32_e32 v0, s4, v0
	; GFX1032-NEXT: BB1_4: ; %Flow			; GFX1032-NEXT: BB1_4: ; %Flow
	; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s8			; GFX1032-NEXT: s_or_b32 exec_lo, exec_lo, s8
	; GFX1032-NEXT: s_wqm_b32 s4, -1			; GFX1032-NEXT: s_wqm_b32 s4, -1
	; GFX1032-NEXT: s_andn2_b32 vcc_lo, exec_lo, s4			; GFX1032-NEXT: s_andn2_b32 vcc_lo, exec_lo, s4
	; GFX1032-NEXT: s_cbranch_vccnz BB1_6			; GFX1032-NEXT: s_cbranch_vccnz BB1_6
	; GFX1032-NEXT: ; %bb.5: ; %if			; GFX1032-NEXT: ; %bb.5: ; %if
	; GFX1032-NEXT: buffer_store_dword v4, off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; GFX1032-NEXT: BB1_6: ; %UnifiedReturnBlock			; GFX1032-NEXT: BB1_6: ; %UnifiedReturnBlock
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	entry:			entry:
	%cond1 = call i1 @llvm.amdgcn.wqm.vote(i1 true)			%cond1 = call i1 @llvm.amdgcn.wqm.vote(i1 true)
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %val, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %val, <4 x i32> %inout, i32 0, i32 0, i32 0)
	%cond2 = call i1 @llvm.amdgcn.wqm.vote(i1 true)			%cond2 = call i1 @llvm.amdgcn.wqm.vote(i1 true)
	%cond = and i1 %cond1, %cond2			%cond = and i1 %cond1, %cond2
	br i1 %cond, label %if, label %else			br i1 %cond, label %if, label %else
	if:			if:
	%bitcast = bitcast i32 %old to float			%bitcast = bitcast i32 %old to float
	call void @llvm.amdgcn.raw.buffer.store.f32(float %bitcast, <4 x i32> %out, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.raw.buffer.store.f32(float %bitcast, <4 x i32> %out, i32 0, i32 0, i32 0)
	ret void			ret void
	else:			else:
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX10 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32,GFX10 %s

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()
	declare i32 @llvm.amdgcn.raw.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32)			declare i32 @llvm.amdgcn.raw.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32)
	declare i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32)			declare i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32)

	; Show what the atomic optimization pass will do for raw buffers.			; Show what the atomic optimization pass will do for raw buffers.

	; GCN-LABEL: add_i32_constant:			; GCN-LABEL: add_i32_constant:
	Show All 37 Lines
	; GFX7LESS-NOT: v_mbcnt_lo_u32_b32			; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
	; GFX7LESS-NOT: v_mbcnt_hi_u32_b32			; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
	; GFX7LESS-NOT: s_bcnt1_i32_b64			; GFX7LESS-NOT: s_bcnt1_i32_b64
	; GFX7LESS: buffer_atomic_add v{{[0-9]+}}			; GFX7LESS: buffer_atomic_add v{{[0-9]+}}
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_add v[[value]]			; GFX8MORE: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @add_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.add(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; GFX7LESS-NOT: v_mbcnt_lo_u32_b32			; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
	; GFX7LESS-NOT: v_mbcnt_hi_u32_b32			; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
	; GFX7LESS-NOT: s_bcnt1_i32_b64			; GFX7LESS-NOT: s_bcnt1_i32_b64
	; GFX7LESS: buffer_atomic_sub v{{[0-9]+}}			; GFX7LESS: buffer_atomic_sub v{{[0-9]+}}
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_sub v[[value]]			; GFX8MORE: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @sub_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.raw.buffer.atomic.sub(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	Show All 13 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX7LESS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,DPPCOMB %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX89,DPPCOMB %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN64,GFX8MORE,GFX8MORE64,GFX10 %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn---amdgiz -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GCN32,GFX8MORE,GFX8MORE32,GFX10 %s

	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()
	declare i32 @llvm.amdgcn.struct.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32, i32)			declare i32 @llvm.amdgcn.struct.buffer.atomic.add(i32, <4 x i32>, i32, i32, i32, i32)
	declare i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32, i32)			declare i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32, <4 x i32>, i32, i32, i32, i32)

	; Show what the atomic optimization pass will do for struct buffers.			; Show what the atomic optimization pass will do for struct buffers.

	; GCN-LABEL: add_i32_constant:			; GCN-LABEL: add_i32_constant:
	Show All 37 Lines
	; GFX7LESS-NOT: v_mbcnt_lo_u32_b32			; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
	; GFX7LESS-NOT: v_mbcnt_hi_u32_b32			; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
	; GFX7LESS-NOT: s_bcnt1_i32_b64			; GFX7LESS-NOT: s_bcnt1_i32_b64
	; GFX7LESS: buffer_atomic_add v{{[0-9]+}}			; GFX7LESS: buffer_atomic_add v{{[0-9]+}}
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_add v[[value]]			; GFX8MORE: buffer_atomic_add v[[value]]
	define amdgpu_kernel void @add_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @add_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.struct.buffer.atomic.add(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; GFX7LESS-NOT: v_mbcnt_lo_u32_b32			; GFX7LESS-NOT: v_mbcnt_lo_u32_b32
	; GFX7LESS-NOT: v_mbcnt_hi_u32_b32			; GFX7LESS-NOT: v_mbcnt_hi_u32_b32
	; GFX7LESS-NOT: s_bcnt1_i32_b64			; GFX7LESS-NOT: s_bcnt1_i32_b64
	; GFX7LESS: buffer_atomic_sub v{{[0-9]+}}			; GFX7LESS: buffer_atomic_sub v{{[0-9]+}}
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; DPPCOMB: v_add_u32_dpp			; DPPCOMB: v_add_u32_dpp
	; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31			; GFX8MORE32: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 31
	; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63			; GFX8MORE64: v_readlane_b32 s[[scalar_value:[0-9]+]], v{{[0-9]+}}, 63
	; GFX8MORE: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]			; GFX89: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[scalar_value]]
				; GFX10: s_mov_b32 s[[copy_value:[0-9]+]], s[[scalar_value]]
				; GFX10: v_mov_b32{{(_e[0-9]+)?}} v[[value:[0-9]+]], s[[copy_value]]
	; GFX8MORE: buffer_atomic_sub v[[value]]			; GFX8MORE: buffer_atomic_sub v[[value]]
	define amdgpu_kernel void @sub_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {			define amdgpu_kernel void @sub_i32_varying_vdata(i32 addrspace(1)* %out, <4 x i32> %inout) {
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = call i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)			%old = call i32 @llvm.amdgcn.struct.buffer.atomic.sub(i32 %lane, <4 x i32> %inout, i32 0, i32 0, i32 0, i32 0)
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	Show All 26 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll

	Show All 11 Lines
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_2d:			; GFX10-LABEL: gather4_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4 v[0:3], v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32 1, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f16(i32 1, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {			define amdgpu_ps <4 x float> @gather4_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {
	; GFX9-LABEL: gather4_cube:			; GFX9-LABEL: gather4_cube:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16 da			; GFX9-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16 da
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_cube:			; GFX10-LABEL: gather4_cube:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_CUBE a16			; GFX10-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_CUBE a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32 1, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cube.v4f32.f16(i32 1, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {			define amdgpu_ps <4 x float> @gather4_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {
	; GFX9-LABEL: gather4_2darray:			; GFX9-LABEL: gather4_2darray:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16 da			; GFX9-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16 da
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_2darray:			; GFX10-LABEL: gather4_2darray:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D_ARRAY a16			; GFX10-NEXT: image_gather4 v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D_ARRAY a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32 1, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.2darray.v4f32.f16(i32 1, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; GFX9-LABEL: gather4_c_2d:			; GFX9-LABEL: gather4_c_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_2d:			; GFX10-LABEL: gather4_c_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_cl_2d:			; GFX9-LABEL: gather4_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_cl_2d:			; GFX10-LABEL: gather4_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32 1, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.cl.2d.v4f32.f16(i32 1, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_c_cl_2d:			; GFX9-LABEL: gather4_c_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GFX9-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_c_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_cl_2d:			; GFX10-LABEL: gather4_c_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.cl.2d.v4f32.f32(i32 1, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {
	; GFX9-LABEL: gather4_b_2d:			; GFX9-LABEL: gather4_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_b_2d:			; GFX10-LABEL: gather4_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {
	; GFX9-LABEL: gather4_c_b_2d:			; GFX9-LABEL: gather4_c_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_b_2d:			; GFX10-LABEL: gather4_c_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {
	; GFX9-LABEL: gather4_b_cl_2d:			; GFX9-LABEL: gather4_b_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GFX9-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_b_cl_2d:			; GFX10-LABEL: gather4_b_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @gather4_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {
	Show All 9 Lines
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[4:7], s[0:7], s[8:11] dmask:0x1 a16			; GFX9-NEXT: image_gather4_c_b_cl v[0:3], v[4:7], s[0:7], s[8:11] dmask:0x1 a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: gather4_c_b_cl_2d:			; GFX10-LABEL: gather4_c_b_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f16(i32 1, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {			define amdgpu_ps <4 x float> @gather4_l_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll

	Show All 9 Lines
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d:			; GFX10-LABEL: sample_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; GFX9-LABEL: sample_2d:			; GFX9-LABEL: sample_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_2d:			; GFX10-LABEL: sample_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %r) {			define amdgpu_ps <4 x float> @sample_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %r) {
	; GFX9-LABEL: sample_3d:			; GFX9-LABEL: sample_3d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_3d:			; GFX10-LABEL: sample_3d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_3D a16			; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_3D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f16(i32 15, half %s, half %t, half %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f16(i32 15, half %s, half %t, half %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {			define amdgpu_ps <4 x float> @sample_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {
	; GFX9-LABEL: sample_cube:			; GFX9-LABEL: sample_cube:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16 da			; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16 da
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_cube:			; GFX10-LABEL: sample_cube:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_CUBE a16			; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_CUBE a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f16(i32 15, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f16(i32 15, half %s, half %t, half %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_1darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %slice) {			define amdgpu_ps <4 x float> @sample_1darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %slice) {
	; GFX9-LABEL: sample_1darray:			; GFX9-LABEL: sample_1darray:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16 da			; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16 da
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1darray:			; GFX10-LABEL: sample_1darray:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY a16			; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f16(i32 15, half %s, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f16(i32 15, half %s, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {			define amdgpu_ps <4 x float> @sample_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {
	; GFX9-LABEL: sample_2darray:			; GFX9-LABEL: sample_2darray:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16 da			; GFX9-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16 da
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_2darray:			; GFX10-LABEL: sample_2darray:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY a16			; GFX10-NEXT: image_sample v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f16(i32 15, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f16(i32 15, half %s, half %t, half %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {			define amdgpu_ps <4 x float> @sample_c_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {
	; GFX9-LABEL: sample_c_1d:			; GFX9-LABEL: sample_c_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_1d:			; GFX10-LABEL: sample_c_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32 15, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32 15, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; GFX9-LABEL: sample_c_2d:			; GFX9-LABEL: sample_c_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_2d:			; GFX10-LABEL: sample_c_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %clamp) {
	; GFX9-LABEL: sample_cl_1d:			; GFX9-LABEL: sample_cl_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_cl v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_cl v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_cl_1d:			; GFX10-LABEL: sample_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_cl v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_cl v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32 15, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32 15, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {
	; GFX9-LABEL: sample_cl_2d:			; GFX9-LABEL: sample_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_cl_2d:			; GFX10-LABEL: sample_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0			; GFX10-NEXT: v_lshl_or_b32 v1, v1, 16, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_cl v[0:3], v[1:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32 15, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32 15, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %clamp) {
	; GFX9-LABEL: sample_c_cl_1d:			; GFX9-LABEL: sample_c_cl_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_cl_1d:			; GFX10-LABEL: sample_c_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_c_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32 15, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32 15, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {
	; GFX9-LABEL: sample_c_cl_2d:			; GFX9-LABEL: sample_c_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GFX9-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_cl_2d:			; GFX10-LABEL: sample_c_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_c_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s) {			define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s) {
	; GFX9-LABEL: sample_b_1d:			; GFX9-LABEL: sample_b_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_1d:			; GFX10-LABEL: sample_b_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f16(i32 15, float %bias, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f16(i32 15, float %bias, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {
	; GFX9-LABEL: sample_b_2d:			; GFX9-LABEL: sample_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_2d:			; GFX10-LABEL: sample_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s) {			define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s) {
	; GFX9-LABEL: sample_c_b_1d:			; GFX9-LABEL: sample_c_b_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_1d:			; GFX10-LABEL: sample_c_b_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {			define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {
	; GFX9-LABEL: sample_c_b_2d:			; GFX9-LABEL: sample_c_b_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_2d:			; GFX10-LABEL: sample_c_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %clamp) {
	; GFX9-LABEL: sample_b_cl_1d:			; GFX9-LABEL: sample_b_cl_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_cl_1d:			; GFX10-LABEL: sample_b_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {
	; GFX9-LABEL: sample_b_cl_2d:			; GFX9-LABEL: sample_b_cl_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v5, v3			; GFX9-NEXT: v_mov_b32_e32 v5, v3
	; GFX9-NEXT: v_mov_b32_e32 v3, v0			; GFX9-NEXT: v_mov_b32_e32 v3, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v1
	; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v4, v2, 16, v0
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_b_cl v[0:3], v[3:5], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_cl_2d:			; GFX10-LABEL: sample_b_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0xffff, v1
	; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1			; GFX10-NEXT: v_lshl_or_b32 v1, v2, 16, v1
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_b_cl v[0:3], [v0, v1, v3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %clamp) {
	; GFX9-LABEL: sample_c_b_cl_1d:			; GFX9-LABEL: sample_c_b_cl_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX9-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX9-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_cl_1d:			; GFX10-LABEL: sample_c_b_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16			; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {
	Show All 9 Lines
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_c_b_cl v[0:3], v[4:7], s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample_c_b_cl v[0:3], v[4:7], s[0:7], s[8:11] dmask:0xf a16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_cl_2d:			; GFX10-LABEL: sample_c_b_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2			; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
	; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2			; GFX10-NEXT: v_lshl_or_b32 v2, v3, 16, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16			; GFX10-NEXT: image_sample_c_b_cl v[0:3], [v0, v1, v2, v4], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D a16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f16(i32 15, float %bias, float %zcompare, half %s, half %t, half %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {			define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {
	▲ Show 20 Lines • Show All 757 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.dim.ll

	Show All 29 Lines
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample v0, v[0:1], s[0:7], s[8:11] dmask:0x1 d16			; GFX9-NEXT: image_sample v0, v[0:1], s[0:7], s[8:11] dmask:0x1 d16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: image_sample_2d_f16:			; GFX10-LABEL: image_sample_2d_f16:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample v0, v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D d16			; GFX10-NEXT: image_sample v0, v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D d16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%tex = call half @llvm.amdgcn.image.sample.2d.f16.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%tex = call half @llvm.amdgcn.image.sample.2d.f16.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret half %tex			ret half %tex
	}			}

	define amdgpu_ps half @image_sample_2d_f16_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, i32 addrspace(1)* inreg %out) {			define amdgpu_ps half @image_sample_2d_f16_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, i32 addrspace(1)* inreg %out) {
	; TONGA-LABEL: image_sample_2d_f16_tfe:			; TONGA-LABEL: image_sample_2d_f16_tfe:
	; TONGA: ; %bb.0: ; %main_body			; TONGA: ; %bb.0: ; %main_body
	; TONGA-NEXT: s_mov_b64 s[14:15], exec			; TONGA-NEXT: s_mov_b64 s[14:15], exec
	; TONGA-NEXT: s_wqm_b64 exec, exec			; TONGA-NEXT: s_wqm_b64 exec, exec
	; TONGA-NEXT: v_mov_b32_e32 v2, 0			; TONGA-NEXT: v_mov_b32_e32 v2, 0
	; TONGA-NEXT: v_mov_b32_e32 v4, s12
	; TONGA-NEXT: v_mov_b32_e32 v5, s13
	; TONGA-NEXT: v_mov_b32_e32 v3, v2			; TONGA-NEXT: v_mov_b32_e32 v3, v2
	; TONGA-NEXT: s_and_b64 exec, exec, s[14:15]			; TONGA-NEXT: s_and_b64 exec, exec, s[14:15]
	; TONGA-NEXT: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 tfe d16			; TONGA-NEXT: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 tfe d16
				; TONGA-NEXT: v_mov_b32_e32 v0, s12
				; TONGA-NEXT: v_mov_b32_e32 v1, s13
	; TONGA-NEXT: s_waitcnt vmcnt(0)			; TONGA-NEXT: s_waitcnt vmcnt(0)
				; TONGA-NEXT: flat_store_dword v[0:1], v3
	; TONGA-NEXT: v_mov_b32_e32 v0, v2			; TONGA-NEXT: v_mov_b32_e32 v0, v2
	; TONGA-NEXT: flat_store_dword v[4:5], v3
	; TONGA-NEXT: s_waitcnt vmcnt(0)			; TONGA-NEXT: s_waitcnt vmcnt(0)
	; TONGA-NEXT: ; return to shader part epilog			; TONGA-NEXT: ; return to shader part epilog
	;			;
	; GFX81-LABEL: image_sample_2d_f16_tfe:			; GFX81-LABEL: image_sample_2d_f16_tfe:
	; GFX81: ; %bb.0: ; %main_body			; GFX81: ; %bb.0: ; %main_body
	; GFX81-NEXT: s_mov_b64 s[14:15], exec			; GFX81-NEXT: s_mov_b64 s[14:15], exec
	; GFX81-NEXT: s_wqm_b64 exec, exec			; GFX81-NEXT: s_wqm_b64 exec, exec
	; GFX81-NEXT: v_mov_b32_e32 v2, 0			; GFX81-NEXT: v_mov_b32_e32 v2, 0
	; GFX81-NEXT: v_mov_b32_e32 v4, s12
	; GFX81-NEXT: v_mov_b32_e32 v5, s13
	; GFX81-NEXT: v_mov_b32_e32 v3, v2			; GFX81-NEXT: v_mov_b32_e32 v3, v2
	; GFX81-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX81-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX81-NEXT: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 tfe d16			; GFX81-NEXT: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 tfe d16
				; GFX81-NEXT: v_mov_b32_e32 v0, s12
				; GFX81-NEXT: v_mov_b32_e32 v1, s13
	; GFX81-NEXT: s_waitcnt vmcnt(0)			; GFX81-NEXT: s_waitcnt vmcnt(0)
				; GFX81-NEXT: flat_store_dword v[0:1], v3
	; GFX81-NEXT: v_mov_b32_e32 v0, v2			; GFX81-NEXT: v_mov_b32_e32 v0, v2
	; GFX81-NEXT: flat_store_dword v[4:5], v3
	; GFX81-NEXT: s_waitcnt vmcnt(0)			; GFX81-NEXT: s_waitcnt vmcnt(0)
	; GFX81-NEXT: ; return to shader part epilog			; GFX81-NEXT: ; return to shader part epilog
	;			;
	; GFX9-LABEL: image_sample_2d_f16_tfe:			; GFX9-LABEL: image_sample_2d_f16_tfe:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[14:15], exec			; GFX9-NEXT: s_mov_b64 s[14:15], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v4, s12
	; GFX9-NEXT: v_mov_b32_e32 v5, s13
	; GFX9-NEXT: v_mov_b32_e32 v3, v2			; GFX9-NEXT: v_mov_b32_e32 v3, v2
	; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX9-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX9-NEXT: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 tfe d16			; GFX9-NEXT: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 tfe d16
				; GFX9-NEXT: v_mov_b32_e32 v0, s12
				; GFX9-NEXT: v_mov_b32_e32 v1, s13
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: global_store_dword v[0:1], v3, off
	; GFX9-NEXT: v_mov_b32_e32 v0, v2			; GFX9-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-NEXT: global_store_dword v[4:5], v3, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: image_sample_2d_f16_tfe:			; GFX10-LABEL: image_sample_2d_f16_tfe:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s14, exec_lo			; GFX10-NEXT: s_mov_b32 s14, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, s12
	; GFX10-NEXT: v_mov_b32_e32 v5, s13
	; GFX10-NEXT: v_mov_b32_e32 v3, v2			; GFX10-NEXT: v_mov_b32_e32 v3, v2
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14
	; GFX10-NEXT: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe d16			; GFX10-NEXT: image_sample v[2:3], v[0:1], s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_2D tfe d16
				; GFX10-NEXT: v_mov_b32_e32 v0, s12
				; GFX10-NEXT: v_mov_b32_e32 v1, s13
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: global_store_dword v[0:1], v3, off
	; GFX10-NEXT: v_mov_b32_e32 v0, v2			; GFX10-NEXT: v_mov_b32_e32 v0, v2
	; GFX10-NEXT: global_store_dword v[4:5], v3, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%tex = call {half,i32} @llvm.amdgcn.image.sample.2d.f16i32.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)			%tex = call {half,i32} @llvm.amdgcn.image.sample.2d.f16i32.f32(i32 1, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)
	%tex.vec = extractvalue {half, i32} %tex, 0			%tex.vec = extractvalue {half, i32} %tex, 0
	%tex.err = extractvalue {half, i32} %tex, 1			%tex.err = extractvalue {half, i32} %tex, 1
	store i32 %tex.err, i32 addrspace(1)* %out, align 4			store i32 %tex.err, i32 addrspace(1)* %out, align 4
	ret half %tex.vec			ret half %tex.vec
	▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0x7 d16			; GFX9-NEXT: image_sample_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0x7 d16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: image_sample_b_2d_v3f16:			; GFX10-LABEL: image_sample_b_2d_v3f16:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0x7 dim:SQ_RSRC_IMG_2D d16			; GFX10-NEXT: image_sample_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0x7 dim:SQ_RSRC_IMG_2D d16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%tex = call <3 x half> @llvm.amdgcn.image.sample.b.2d.v3f16.f32.f32(i32 7, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%tex = call <3 x half> @llvm.amdgcn.image.sample.b.2d.v3f16.f32.f32(i32 7, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	%tex_wide = shufflevector <3 x half> %tex, <3 x half> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%tex_wide = shufflevector <3 x half> %tex, <3 x half> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%r = bitcast <4 x half> %tex_wide to <2 x float>			%r = bitcast <4 x half> %tex_wide to <2 x float>
	ret <2 x float> %r			ret <2 x float> %r
	}			}
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_mov_b32_e32 v0, v3			; GFX9-NEXT: v_mov_b32_e32 v0, v3
	; GFX9-NEXT: v_mov_b32_e32 v1, v4			; GFX9-NEXT: v_mov_b32_e32 v1, v4
	; GFX9-NEXT: v_mov_b32_e32 v2, v5			; GFX9-NEXT: v_mov_b32_e32 v2, v5
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: image_sample_b_2d_v3f16_tfe:			; GFX10-LABEL: image_sample_b_2d_v3f16_tfe:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_mov_b32_e32 v3, v0			; GFX10-NEXT: v_mov_b32_e32 v3, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v5, v2			; GFX10-NEXT: v_mov_b32_e32 v5, v2
	; GFX10-NEXT: v_mov_b32_e32 v4, v1			; GFX10-NEXT: v_mov_b32_e32 v4, v1
	; GFX10-NEXT: v_mov_b32_e32 v1, v0			; GFX10-NEXT: v_mov_b32_e32 v1, v0
	; GFX10-NEXT: v_mov_b32_e32 v2, v0			; GFX10-NEXT: v_mov_b32_e32 v2, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b v[0:2], v[3:5], s[0:7], s[8:11] dmask:0x7 dim:SQ_RSRC_IMG_2D tfe d16			; GFX10-NEXT: image_sample_b v[0:2], v[3:5], s[0:7], s[8:11] dmask:0x7 dim:SQ_RSRC_IMG_2D tfe d16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%tex = call {<3 x half>,i32} @llvm.amdgcn.image.sample.b.2d.v3f16i32.f32.f32(i32 7, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)			%tex = call {<3 x half>,i32} @llvm.amdgcn.image.sample.b.2d.v3f16i32.f32.f32(i32 7, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)
	%tex.vec = extractvalue {<3 x half>, i32} %tex, 0			%tex.vec = extractvalue {<3 x half>, i32} %tex, 0
	%tex.vec_wide = shufflevector <3 x half> %tex.vec, <3 x half> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%tex.vec_wide = shufflevector <3 x half> %tex.vec, <3 x half> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%tex.err = extractvalue {<3 x half>, i32} %tex, 1			%tex.err = extractvalue {<3 x half>, i32} %tex, 1
	%tex.vecf = bitcast <4 x half> %tex.vec_wide to <2 x float>			%tex.vecf = bitcast <4 x half> %tex.vec_wide to <2 x float>
	Show All 36 Lines
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0xf d16			; GFX9-NEXT: image_sample_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0xf d16
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: image_sample_b_2d_v4f16:			; GFX10-LABEL: image_sample_b_2d_v4f16:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D d16			; GFX10-NEXT: image_sample_b v[0:1], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D d16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%tex = call <4 x half> @llvm.amdgcn.image.sample.b.2d.v4f16.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%tex = call <4 x half> @llvm.amdgcn.image.sample.b.2d.v4f16.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	%r = bitcast <4 x half> %tex to <2 x float>			%r = bitcast <4 x half> %tex to <2 x float>
	ret <2 x float> %r			ret <2 x float> %r
	}			}

	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_mov_b32_e32 v0, v3			; GFX9-NEXT: v_mov_b32_e32 v0, v3
	; GFX9-NEXT: v_mov_b32_e32 v1, v4			; GFX9-NEXT: v_mov_b32_e32 v1, v4
	; GFX9-NEXT: v_mov_b32_e32 v2, v5			; GFX9-NEXT: v_mov_b32_e32 v2, v5
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: image_sample_b_2d_v4f16_tfe:			; GFX10-LABEL: image_sample_b_2d_v4f16_tfe:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo			; GFX10-NEXT: s_mov_b32 s12, exec_lo
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GFX10-NEXT: v_mov_b32_e32 v3, v0			; GFX10-NEXT: v_mov_b32_e32 v3, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v5, v2			; GFX10-NEXT: v_mov_b32_e32 v5, v2
	; GFX10-NEXT: v_mov_b32_e32 v4, v1			; GFX10-NEXT: v_mov_b32_e32 v4, v1
	; GFX10-NEXT: v_mov_b32_e32 v1, v0			; GFX10-NEXT: v_mov_b32_e32 v1, v0
	; GFX10-NEXT: v_mov_b32_e32 v2, v0			; GFX10-NEXT: v_mov_b32_e32 v2, v0
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12
	; GFX10-NEXT: image_sample_b v[0:2], v[3:5], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D tfe d16			; GFX10-NEXT: image_sample_b v[0:2], v[3:5], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D tfe d16
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%tex = call {<4 x half>,i32} @llvm.amdgcn.image.sample.b.2d.v4f16i32.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)			%tex = call {<4 x half>,i32} @llvm.amdgcn.image.sample.b.2d.v4f16i32.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 1, i32 0)
	%tex.vec = extractvalue {<4 x half>, i32} %tex, 0			%tex.vec = extractvalue {<4 x half>, i32} %tex, 0
	%tex.err = extractvalue {<4 x half>, i32} %tex, 1			%tex.err = extractvalue {<4 x half>, i32} %tex, 1
	%tex.vecf = bitcast <4 x half> %tex.vec to <2 x float>			%tex.vecf = bitcast <4 x half> %tex.vec to <2 x float>
	%tex.vecf.0 = extractelement <2 x float> %tex.vecf, i32 0			%tex.vecf.0 = extractelement <2 x float> %tex.vecf, i32 0
	Show All 23 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll

	Show All 19 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d:			; GFX10-LABEL: sample_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_1d_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 addrspace(1)* inreg %out, float %s) {			define amdgpu_ps <4 x float> @sample_1d_tfe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 addrspace(1)* inreg %out, float %s) {
	; VERDE-LABEL: sample_1d_tfe:			; VERDE-LABEL: sample_1d_tfe:
	; VERDE: ; %bb.0: ; %main_body			; VERDE: ; %bb.0: ; %main_body
	; VERDE-NEXT: s_mov_b64 s[16:17], exec			; VERDE-NEXT: s_mov_b64 s[14:15], exec
	; VERDE-NEXT: s_wqm_b64 exec, exec			; VERDE-NEXT: s_wqm_b64 exec, exec
	; VERDE-NEXT: v_mov_b32_e32 v5, v0			; VERDE-NEXT: v_mov_b32_e32 v5, v0
	; VERDE-NEXT: v_mov_b32_e32 v0, 0			; VERDE-NEXT: v_mov_b32_e32 v0, 0
	; VERDE-NEXT: s_mov_b32 s15, 0xf000
	; VERDE-NEXT: s_mov_b32 s14, -1
	; VERDE-NEXT: v_mov_b32_e32 v1, v0			; VERDE-NEXT: v_mov_b32_e32 v1, v0
	; VERDE-NEXT: v_mov_b32_e32 v2, v0			; VERDE-NEXT: v_mov_b32_e32 v2, v0
	; VERDE-NEXT: v_mov_b32_e32 v3, v0			; VERDE-NEXT: v_mov_b32_e32 v3, v0
	; VERDE-NEXT: v_mov_b32_e32 v4, v0			; VERDE-NEXT: v_mov_b32_e32 v4, v0
	; VERDE-NEXT: s_and_b64 exec, exec, s[16:17]			; VERDE-NEXT: s_and_b64 exec, exec, s[14:15]
	; VERDE-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf tfe			; VERDE-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf tfe
				; VERDE-NEXT: s_mov_b32 s15, 0xf000
				; VERDE-NEXT: s_mov_b32 s14, -1
	; VERDE-NEXT: s_waitcnt vmcnt(0)			; VERDE-NEXT: s_waitcnt vmcnt(0)
	; VERDE-NEXT: buffer_store_dword v4, off, s[12:15], 0			; VERDE-NEXT: buffer_store_dword v4, off, s[12:15], 0
	; VERDE-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; VERDE-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; VERDE-NEXT: ; return to shader part epilog			; VERDE-NEXT: ; return to shader part epilog
	;			;
	; GFX6789-LABEL: sample_1d_tfe:			; GFX6789-LABEL: sample_1d_tfe:
	; GFX6789: ; %bb.0: ; %main_body			; GFX6789: ; %bb.0: ; %main_body
	; GFX6789-NEXT: s_mov_b64 s[14:15], exec			; GFX6789-NEXT: s_mov_b64 s[14:15], exec
	; GFX6789-NEXT: s_wqm_b64 exec, exec			; GFX6789-NEXT: s_wqm_b64 exec, exec
	; GFX6789-NEXT: v_mov_b32_e32 v5, v0			; GFX6789-NEXT: v_mov_b32_e32 v5, v0
	; GFX6789-NEXT: v_mov_b32_e32 v0, 0			; GFX6789-NEXT: v_mov_b32_e32 v0, 0
	; GFX6789-NEXT: v_mov_b32_e32 v6, s12
	; GFX6789-NEXT: v_mov_b32_e32 v7, s13
	; GFX6789-NEXT: v_mov_b32_e32 v1, v0			; GFX6789-NEXT: v_mov_b32_e32 v1, v0
	; GFX6789-NEXT: v_mov_b32_e32 v2, v0			; GFX6789-NEXT: v_mov_b32_e32 v2, v0
	; GFX6789-NEXT: v_mov_b32_e32 v3, v0			; GFX6789-NEXT: v_mov_b32_e32 v3, v0
	; GFX6789-NEXT: v_mov_b32_e32 v4, v0			; GFX6789-NEXT: v_mov_b32_e32 v4, v0
	; GFX6789-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6789-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6789-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf tfe			; GFX6789-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf tfe
				; GFX6789-NEXT: v_mov_b32_e32 v5, s12
				; GFX6789-NEXT: v_mov_b32_e32 v6, s13
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: global_store_dword v[6:7], v4, off			; GFX6789-NEXT: global_store_dword v[5:6], v4, off
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_tfe:			; GFX10-LABEL: sample_1d_tfe:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s14, exec_lo ; encoding: [0x7e,0x03,0x8e,0xbe]			; GFX10-NEXT: s_mov_b32 s14, exec_lo ; encoding: [0x7e,0x03,0x8e,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v5, v0 ; encoding: [0x00,0x03,0x0a,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v5, v0 ; encoding: [0x00,0x03,0x0a,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v6, s12 ; encoding: [0x0c,0x02,0x0c,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v7, s13 ; encoding: [0x0d,0x02,0x0e,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v4, v0 ; encoding: [0x00,0x03,0x08,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v4, v0 ; encoding: [0x00,0x03,0x08,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14 ; encoding: [0x7e,0x0e,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14 ; encoding: [0x7e,0x0e,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x0f,0x81,0xf0,0x05,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x0f,0x81,0xf0,0x05,0x00,0x40,0x00]
				; GFX10-NEXT: v_mov_b32_e32 v5, s12 ; encoding: [0x0c,0x02,0x0a,0x7e]
				; GFX10-NEXT: v_mov_b32_e32 v6, s13 ; encoding: [0x0d,0x02,0x0c,0x7e]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: global_store_dword v[6:7], v4, off ; encoding: [0x00,0x80,0x70,0xdc,0x06,0x04,0x7d,0x00]			; GFX10-NEXT: global_store_dword v[5:6], v4, off ; encoding: [0x00,0x80,0x70,0xdc,0x05,0x04,0x7d,0x00]
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0 ; encoding: [0x00,0x00,0xfd,0xbb]			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0 ; encoding: [0x00,0x00,0xfd,0xbb]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
	%v.vec = extractvalue {<4 x float>, i32} %v, 0			%v.vec = extractvalue {<4 x float>, i32} %v, 0
	%v.err = extractvalue {<4 x float>, i32} %v, 1			%v.err = extractvalue {<4 x float>, i32} %v, 1
	store i32 %v.err, i32 addrspace(1)* %out, align 4			store i32 %v.err, i32 addrspace(1)* %out, align 4
	ret <4 x float> %v.vec			ret <4 x float> %v.vec
	Show All 22 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x1 tfe			; GFX6789-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x1 tfe
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_tfe_adjust_writemask_1:			; GFX10-LABEL: sample_1d_tfe_adjust_writemask_1:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x01,0x81,0xf0,0x02,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x01,0x81,0xf0,0x02,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
	%res.vec = extractvalue {<4 x float>,i32} %v, 0			%res.vec = extractvalue {<4 x float>,i32} %v, 0
	%res.f = extractelement <4 x float> %res.vec, i32 0			%res.f = extractelement <4 x float> %res.vec, i32 0
	%res.err = extractvalue {<4 x float>,i32} %v, 1			%res.err = extractvalue {<4 x float>,i32} %v, 1
	%res.errf = bitcast i32 %res.err to float			%res.errf = bitcast i32 %res.err to float
	Show All 25 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x2 tfe			; GFX6789-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x2 tfe
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_tfe_adjust_writemask_2:			; GFX10-LABEL: sample_1d_tfe_adjust_writemask_2:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x2 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x02,0x81,0xf0,0x02,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x2 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x02,0x81,0xf0,0x02,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
	%res.vec = extractvalue {<4 x float>,i32} %v, 0			%res.vec = extractvalue {<4 x float>,i32} %v, 0
	%res.f = extractelement <4 x float> %res.vec, i32 1			%res.f = extractelement <4 x float> %res.vec, i32 1
	%res.err = extractvalue {<4 x float>,i32} %v, 1			%res.err = extractvalue {<4 x float>,i32} %v, 1
	%res.errf = bitcast i32 %res.err to float			%res.errf = bitcast i32 %res.err to float
	Show All 25 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x4 tfe			; GFX6789-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x4 tfe
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_tfe_adjust_writemask_3:			; GFX10-LABEL: sample_1d_tfe_adjust_writemask_3:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x04,0x81,0xf0,0x02,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x04,0x81,0xf0,0x02,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
	%res.vec = extractvalue {<4 x float>,i32} %v, 0			%res.vec = extractvalue {<4 x float>,i32} %v, 0
	%res.f = extractelement <4 x float> %res.vec, i32 2			%res.f = extractelement <4 x float> %res.vec, i32 2
	%res.err = extractvalue {<4 x float>,i32} %v, 1			%res.err = extractvalue {<4 x float>,i32} %v, 1
	%res.errf = bitcast i32 %res.err to float			%res.errf = bitcast i32 %res.err to float
	Show All 25 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x8 tfe			; GFX6789-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x8 tfe
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_tfe_adjust_writemask_4:			; GFX10-LABEL: sample_1d_tfe_adjust_writemask_4:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x8 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x08,0x81,0xf0,0x02,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v2, s[0:7], s[8:11] dmask:0x8 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x08,0x81,0xf0,0x02,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
	%res.vec = extractvalue {<4 x float>,i32} %v, 0			%res.vec = extractvalue {<4 x float>,i32} %v, 0
	%res.f = extractelement <4 x float> %res.vec, i32 3			%res.f = extractelement <4 x float> %res.vec, i32 3
	%res.err = extractvalue {<4 x float>,i32} %v, 1			%res.err = extractvalue {<4 x float>,i32} %v, 1
	%res.errf = bitcast i32 %res.err to float			%res.errf = bitcast i32 %res.err to float
	Show All 27 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0x3 tfe			; GFX6789-NEXT: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0x3 tfe
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_tfe_adjust_writemask_12:			; GFX10-LABEL: sample_1d_tfe_adjust_writemask_12:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0x3 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x03,0x81,0xf0,0x03,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0x3 dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x03,0x81,0xf0,0x03,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
	%res.vec = extractvalue {<4 x float>,i32} %v, 0			%res.vec = extractvalue {<4 x float>,i32} %v, 0
	%res.f1 = extractelement <4 x float> %res.vec, i32 0			%res.f1 = extractelement <4 x float> %res.vec, i32 0
	%res.f2 = extractelement <4 x float> %res.vec, i32 1			%res.f2 = extractelement <4 x float> %res.vec, i32 1
	%res.err = extractvalue {<4 x float>,i32} %v, 1			%res.err = extractvalue {<4 x float>,i32} %v, 1
	Show All 29 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0xa tfe			; GFX6789-NEXT: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0xa tfe
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_tfe_adjust_writemask_24:			; GFX10-LABEL: sample_1d_tfe_adjust_writemask_24:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0xa dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x0a,0x81,0xf0,0x03,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:2], v3, s[0:7], s[8:11] dmask:0xa dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x0a,0x81,0xf0,0x03,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
	%res.vec = extractvalue {<4 x float>,i32} %v, 0			%res.vec = extractvalue {<4 x float>,i32} %v, 0
	%res.f1 = extractelement <4 x float> %res.vec, i32 1			%res.f1 = extractelement <4 x float> %res.vec, i32 1
	%res.f2 = extractelement <4 x float> %res.vec, i32 3			%res.f2 = extractelement <4 x float> %res.vec, i32 3
	%res.err = extractvalue {<4 x float>,i32} %v, 1			%res.err = extractvalue {<4 x float>,i32} %v, 1
	Show All 31 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v4, s[0:7], s[8:11] dmask:0xd tfe			; GFX6789-NEXT: image_sample v[0:3], v4, s[0:7], s[8:11] dmask:0xd tfe
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_tfe_adjust_writemask_134:			; GFX10-LABEL: sample_1d_tfe_adjust_writemask_134:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v4, v0 ; encoding: [0x00,0x03,0x08,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v4, v0 ; encoding: [0x00,0x03,0x08,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v4, s[0:7], s[8:11] dmask:0xd dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x0d,0x81,0xf0,0x04,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v4, s[0:7], s[8:11] dmask:0xd dim:SQ_RSRC_IMG_1D tfe ; encoding: [0x00,0x0d,0x81,0xf0,0x04,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 1, i32 0)
	%res.vec = extractvalue {<4 x float>,i32} %v, 0			%res.vec = extractvalue {<4 x float>,i32} %v, 0
	%res.f1 = extractelement <4 x float> %res.vec, i32 0			%res.f1 = extractelement <4 x float> %res.vec, i32 0
	%res.f2 = extractelement <4 x float> %res.vec, i32 2			%res.f2 = extractelement <4 x float> %res.vec, i32 2
	%res.f3 = extractelement <4 x float> %res.vec, i32 3			%res.f3 = extractelement <4 x float> %res.vec, i32 3
	%res.err = extractvalue {<4 x float>,i32} %v, 1			%res.err = extractvalue {<4 x float>,i32} %v, 1
	%res.errf = bitcast i32 %res.err to float			%res.errf = bitcast i32 %res.err to float
	%res.tmp1 = insertelement <4 x float> undef, float %res.f1, i32 0			%res.tmp1 = insertelement <4 x float> undef, float %res.f1, i32 0
	%res.tmp2 = insertelement <4 x float> %res.tmp1, float %res.f2, i32 1			%res.tmp2 = insertelement <4 x float> %res.tmp1, float %res.f2, i32 1
	%res.tmp3 = insertelement <4 x float> %res.tmp2, float %res.f3, i32 2			%res.tmp3 = insertelement <4 x float> %res.tmp2, float %res.f3, i32 2
	%res = insertelement <4 x float> %res.tmp3, float %res.errf, i32 3			%res = insertelement <4 x float> %res.tmp3, float %res.errf, i32 3
	ret <4 x float> %res			ret <4 x float> %res
	}			}

	define amdgpu_ps <4 x float> @sample_1d_lwe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 addrspace(1)* inreg %out, float %s) {			define amdgpu_ps <4 x float> @sample_1d_lwe(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 addrspace(1)* inreg %out, float %s) {
	; VERDE-LABEL: sample_1d_lwe:			; VERDE-LABEL: sample_1d_lwe:
	; VERDE: ; %bb.0: ; %main_body			; VERDE: ; %bb.0: ; %main_body
	; VERDE-NEXT: s_mov_b64 s[16:17], exec			; VERDE-NEXT: s_mov_b64 s[14:15], exec
	; VERDE-NEXT: s_wqm_b64 exec, exec			; VERDE-NEXT: s_wqm_b64 exec, exec
	; VERDE-NEXT: v_mov_b32_e32 v5, v0			; VERDE-NEXT: v_mov_b32_e32 v5, v0
	; VERDE-NEXT: v_mov_b32_e32 v0, 0			; VERDE-NEXT: v_mov_b32_e32 v0, 0
	; VERDE-NEXT: s_mov_b32 s15, 0xf000
	; VERDE-NEXT: s_mov_b32 s14, -1
	; VERDE-NEXT: v_mov_b32_e32 v1, v0			; VERDE-NEXT: v_mov_b32_e32 v1, v0
	; VERDE-NEXT: v_mov_b32_e32 v2, v0			; VERDE-NEXT: v_mov_b32_e32 v2, v0
	; VERDE-NEXT: v_mov_b32_e32 v3, v0			; VERDE-NEXT: v_mov_b32_e32 v3, v0
	; VERDE-NEXT: v_mov_b32_e32 v4, v0			; VERDE-NEXT: v_mov_b32_e32 v4, v0
	; VERDE-NEXT: s_and_b64 exec, exec, s[16:17]			; VERDE-NEXT: s_and_b64 exec, exec, s[14:15]
	; VERDE-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf lwe			; VERDE-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf lwe
				; VERDE-NEXT: s_mov_b32 s15, 0xf000
				; VERDE-NEXT: s_mov_b32 s14, -1
	; VERDE-NEXT: s_waitcnt vmcnt(0)			; VERDE-NEXT: s_waitcnt vmcnt(0)
	; VERDE-NEXT: buffer_store_dword v4, off, s[12:15], 0			; VERDE-NEXT: buffer_store_dword v4, off, s[12:15], 0
	; VERDE-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; VERDE-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; VERDE-NEXT: ; return to shader part epilog			; VERDE-NEXT: ; return to shader part epilog
	;			;
	; GFX6789-LABEL: sample_1d_lwe:			; GFX6789-LABEL: sample_1d_lwe:
	; GFX6789: ; %bb.0: ; %main_body			; GFX6789: ; %bb.0: ; %main_body
	; GFX6789-NEXT: s_mov_b64 s[14:15], exec			; GFX6789-NEXT: s_mov_b64 s[14:15], exec
	; GFX6789-NEXT: s_wqm_b64 exec, exec			; GFX6789-NEXT: s_wqm_b64 exec, exec
	; GFX6789-NEXT: v_mov_b32_e32 v5, v0			; GFX6789-NEXT: v_mov_b32_e32 v5, v0
	; GFX6789-NEXT: v_mov_b32_e32 v0, 0			; GFX6789-NEXT: v_mov_b32_e32 v0, 0
	; GFX6789-NEXT: v_mov_b32_e32 v6, s12
	; GFX6789-NEXT: v_mov_b32_e32 v7, s13
	; GFX6789-NEXT: v_mov_b32_e32 v1, v0			; GFX6789-NEXT: v_mov_b32_e32 v1, v0
	; GFX6789-NEXT: v_mov_b32_e32 v2, v0			; GFX6789-NEXT: v_mov_b32_e32 v2, v0
	; GFX6789-NEXT: v_mov_b32_e32 v3, v0			; GFX6789-NEXT: v_mov_b32_e32 v3, v0
	; GFX6789-NEXT: v_mov_b32_e32 v4, v0			; GFX6789-NEXT: v_mov_b32_e32 v4, v0
	; GFX6789-NEXT: s_and_b64 exec, exec, s[14:15]			; GFX6789-NEXT: s_and_b64 exec, exec, s[14:15]
	; GFX6789-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf lwe			; GFX6789-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf lwe
				; GFX6789-NEXT: v_mov_b32_e32 v5, s12
				; GFX6789-NEXT: v_mov_b32_e32 v6, s13
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: global_store_dword v[6:7], v4, off			; GFX6789-NEXT: global_store_dword v[5:6], v4, off
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_lwe:			; GFX10-LABEL: sample_1d_lwe:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s14, exec_lo ; encoding: [0x7e,0x03,0x8e,0xbe]			; GFX10-NEXT: s_mov_b32 s14, exec_lo ; encoding: [0x7e,0x03,0x8e,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: v_mov_b32_e32 v5, v0 ; encoding: [0x00,0x03,0x0a,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v5, v0 ; encoding: [0x00,0x03,0x0a,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, 0 ; encoding: [0x80,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v6, s12 ; encoding: [0x0c,0x02,0x0c,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v7, s13 ; encoding: [0x0d,0x02,0x0e,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, v0 ; encoding: [0x00,0x03,0x02,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v3, v0 ; encoding: [0x00,0x03,0x06,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v4, v0 ; encoding: [0x00,0x03,0x08,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v4, v0 ; encoding: [0x00,0x03,0x08,0x7e]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14 ; encoding: [0x7e,0x0e,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s14 ; encoding: [0x7e,0x0e,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D lwe ; encoding: [0x00,0x0f,0x82,0xf0,0x05,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:4], v5, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D lwe ; encoding: [0x00,0x0f,0x82,0xf0,0x05,0x00,0x40,0x00]
				; GFX10-NEXT: v_mov_b32_e32 v5, s12 ; encoding: [0x0c,0x02,0x0a,0x7e]
				; GFX10-NEXT: v_mov_b32_e32 v6, s13 ; encoding: [0x0d,0x02,0x0c,0x7e]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: global_store_dword v[6:7], v4, off ; encoding: [0x00,0x80,0x70,0xdc,0x06,0x04,0x7d,0x00]			; GFX10-NEXT: global_store_dword v[5:6], v4, off ; encoding: [0x00,0x80,0x70,0xdc,0x05,0x04,0x7d,0x00]
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0 ; encoding: [0x00,0x00,0xfd,0xbb]			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0 ; encoding: [0x00,0x00,0xfd,0xbb]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 2, i32 0)			%v = call {<4 x float>,i32} @llvm.amdgcn.image.sample.1d.v4f32i32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 2, i32 0)
	%v.vec = extractvalue {<4 x float>, i32} %v, 0			%v.vec = extractvalue {<4 x float>, i32} %v, 0
	%v.err = extractvalue {<4 x float>, i32} %v, 1			%v.err = extractvalue {<4 x float>, i32} %v, 1
	store i32 %v.err, i32 addrspace(1)* %out, align 4			store i32 %v.err, i32 addrspace(1)* %out, align 4
	ret <4 x float> %v.vec			ret <4 x float> %v.vec
	Show All 16 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_2d:			; GFX10-LABEL: sample_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %r) {			define amdgpu_ps <4 x float> @sample_3d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %r) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_3d:			; GFX10-LABEL: sample_3d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_3D ; encoding: [0x10,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_3D ; encoding: [0x10,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 15, float %s, float %t, float %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 15, float %s, float %t, float %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %face) {			define amdgpu_ps <4 x float> @sample_cube(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %face) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf da			; GFX6789-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf da
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_cube:			; GFX10-LABEL: sample_cube:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_CUBE ; encoding: [0x18,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_CUBE ; encoding: [0x18,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f32(i32 15, float %s, float %t, float %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f32(i32 15, float %s, float %t, float %face, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_1darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %slice) {			define amdgpu_ps <4 x float> @sample_1darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %slice) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf da			; GFX6789-NEXT: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf da
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1darray:			; GFX10-LABEL: sample_1darray:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY ; encoding: [0x20,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D_ARRAY ; encoding: [0x20,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32 15, float %s, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32 15, float %s, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %slice) {			define amdgpu_ps <4 x float> @sample_2darray(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %slice) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf da			; GFX6789-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf da
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_2darray:			; GFX10-LABEL: sample_2darray:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY ; encoding: [0x28,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D_ARRAY ; encoding: [0x28,0x0f,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f32(i32 15, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f32(i32 15, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s) {			define amdgpu_ps <4 x float> @sample_c_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_1d:			; GFX10-LABEL: sample_c_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0xa0,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_c v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0xa0,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32 15, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32 15, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {			define amdgpu_ps <4 x float> @sample_c_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_c v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_c v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_2d:			; GFX10-LABEL: sample_c_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_c v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0xa0,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_c v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0xa0,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32 15, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32 15, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %clamp) {			define amdgpu_ps <4 x float> @sample_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %clamp) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_cl_1d:			; GFX10-LABEL: sample_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0x84,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_cl v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0x84,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f32(i32 15, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f32(i32 15, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %clamp) {			define amdgpu_ps <4 x float> @sample_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %clamp) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_cl_2d:			; GFX10-LABEL: sample_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x84,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x84,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f32(i32 15, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f32(i32 15, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %clamp) {			define amdgpu_ps <4 x float> @sample_c_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %clamp) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_cl_1d:			; GFX10-LABEL: sample_c_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0xa4,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_c_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0xa4,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f32(i32 15, float %zcompare, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f32(i32 15, float %zcompare, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %clamp) {			define amdgpu_ps <4 x float> @sample_c_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %clamp) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_c_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_c_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_cl_2d:			; GFX10-LABEL: sample_c_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_c_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0xa4,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_c_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0xa4,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f32(i32 15, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f32(i32 15, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s) {			define amdgpu_ps <4 x float> @sample_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_1d:			; GFX10-LABEL: sample_b_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0x94,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_b v[0:3], v[0:1], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0x94,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float %bias, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float %bias, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {			define amdgpu_ps <4 x float> @sample_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_2d:			; GFX10-LABEL: sample_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x94,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x94,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s) {			define amdgpu_ps <4 x float> @sample_c_b_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_1d:			; GFX10-LABEL: sample_c_b_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0xb4,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_c_b v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0xb4,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %t) {			define amdgpu_ps <4 x float> @sample_c_b_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %t) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_c_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_c_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_2d:			; GFX10-LABEL: sample_c_b_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_c_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0xb4,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_c_b v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0xb4,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %clamp) {			define amdgpu_ps <4 x float> @sample_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %clamp) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_cl_1d:			; GFX10-LABEL: sample_b_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0x98,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0x98,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float %bias, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float %bias, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t, float %clamp) {			define amdgpu_ps <4 x float> @sample_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t, float %clamp) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_b_cl_2d:			; GFX10-LABEL: sample_b_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x98,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0x98,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float %bias, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float %bias, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %clamp) {			define amdgpu_ps <4 x float> @sample_c_b_cl_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %clamp) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_cl_1d:			; GFX10-LABEL: sample_c_b_cl_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0xb8,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:3], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0f,0xb8,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {			define amdgpu_ps <4 x float> @sample_c_b_cl_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample_c_b_cl v[0:3], v[0:7], s[0:7], s[8:11] dmask:0xf			; GFX6789-NEXT: image_sample_c_b_cl v[0:3], v[0:7], s[0:7], s[8:11] dmask:0xf
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_c_b_cl_2d:			; GFX10-LABEL: sample_c_b_cl_2d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:7], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0xb8,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample_c_b_cl v[0:3], v[0:7], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x08,0x0f,0xb8,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s) {			define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s) {
	▲ Show 20 Lines • Show All 736 Lines • ▼ Show 20 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf unorm			; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf unorm
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_unorm:			; GFX10-LABEL: sample_1d_unorm:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x1f,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D unorm ; encoding: [0x00,0x1f,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 1, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 1, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_1d_glc(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_ps <4 x float> @sample_1d_glc(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf glc			; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf glc
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_glc:			; GFX10-LABEL: sample_1d_glc:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D glc ; encoding: [0x00,0x2f,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D glc ; encoding: [0x00,0x2f,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 1)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 1)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_1d_slc(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_ps <4 x float> @sample_1d_slc(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf slc			; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf slc
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_slc:			; GFX10-LABEL: sample_1d_slc:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D slc ; encoding: [0x00,0x0f,0x80,0xf2,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D slc ; encoding: [0x00,0x0f,0x80,0xf2,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 2)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 2)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps <4 x float> @sample_1d_glc_slc(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_ps <4 x float> @sample_1d_glc_slc(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf glc slc			; GFX6789-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf glc slc
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: sample_1d_glc_slc:			; GFX10-LABEL: sample_1d_glc_slc:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D glc slc ; encoding: [0x00,0x2f,0x80,0xf2,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D glc slc ; encoding: [0x00,0x2f,0x80,0xf2,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 3)			%v = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 3)
	ret <4 x float> %v			ret <4 x float> %v
	}			}

	define amdgpu_ps float @adjust_writemask_sample_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_ps float @adjust_writemask_sample_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	Show All 13 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1			; GFX6789-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_0:			; GFX10-LABEL: adjust_writemask_sample_0:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x01,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v0, v0, s[0:7], s[8:11] dmask:0x1 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x01,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%elt0 = extractelement <4 x float> %r, i32 0			%elt0 = extractelement <4 x float> %r, i32 0
	ret float %elt0			ret float %elt0
	}			}

	Show All 14 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x3			; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x3
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_01:			; GFX10-LABEL: adjust_writemask_sample_01:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x3 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x03,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x3 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x03,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 0, i32 1>			%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 0, i32 1>
	ret <2 x float> %out			ret <2 x float> %out
	}			}

	Show All 14 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:2], v0, s[0:7], s[8:11] dmask:0x7			; GFX6789-NEXT: image_sample v[0:2], v0, s[0:7], s[8:11] dmask:0x7
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_012:			; GFX10-LABEL: adjust_writemask_sample_012:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:2], v0, s[0:7], s[8:11] dmask:0x7 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x07,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:2], v0, s[0:7], s[8:11] dmask:0x7 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x07,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <3 x i32> <i32 0, i32 1, i32 2>			%out = shufflevector <4 x float> %r, <4 x float> undef, <3 x i32> <i32 0, i32 1, i32 2>
	ret <3 x float> %out			ret <3 x float> %out
	}			}

	Show All 14 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x6			; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x6
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_12:			; GFX10-LABEL: adjust_writemask_sample_12:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x6 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x06,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x6 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x06,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 1, i32 2>			%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 1, i32 2>
	ret <2 x float> %out			ret <2 x float> %out
	}			}

	Show All 14 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x9			; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x9
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_03:			; GFX10-LABEL: adjust_writemask_sample_03:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x9 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x09,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x9 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x09,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 0, i32 3>			%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 0, i32 3>
	ret <2 x float> %out			ret <2 x float> %out
	}			}

	Show All 14 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0xa			; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0xa
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_13:			; GFX10-LABEL: adjust_writemask_sample_13:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0xa dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0a,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0xa dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0a,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 1, i32 3>			%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 1, i32 3>
	ret <2 x float> %out			ret <2 x float> %out
	}			}

	Show All 14 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:2], v0, s[0:7], s[8:11] dmask:0xe			; GFX6789-NEXT: image_sample v[0:2], v0, s[0:7], s[8:11] dmask:0xe
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_123:			; GFX10-LABEL: adjust_writemask_sample_123:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:2], v0, s[0:7], s[8:11] dmask:0xe dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0e,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:2], v0, s[0:7], s[8:11] dmask:0xe dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0e,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <3 x i32> <i32 1, i32 2, i32 3>			%out = shufflevector <4 x float> %r, <4 x float> undef, <3 x i32> <i32 1, i32 2, i32 3>
	ret <3 x float> %out			ret <3 x float> %out
	}			}

	Show All 32 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x6			; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x6
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_123_to_12:			; GFX10-LABEL: adjust_writemask_sample_123_to_12:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x6 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x06,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0x6 dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x06,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 14, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 14, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 0, i32 1>			%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 0, i32 1>
	ret <2 x float> %out			ret <2 x float> %out
	}			}

	Show All 14 Lines
	; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX6789-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0xa			; GFX6789-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0xa
	; GFX6789-NEXT: s_waitcnt vmcnt(0)			; GFX6789-NEXT: s_waitcnt vmcnt(0)
	; GFX6789-NEXT: ; return to shader part epilog			; GFX6789-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: adjust_writemask_sample_013_to_13:			; GFX10-LABEL: adjust_writemask_sample_013_to_13:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]			; GFX10-NEXT: s_mov_b32 s12, exec_lo ; encoding: [0x7e,0x03,0x8c,0xbe]
	; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]			; GFX10-NEXT: s_wqm_b32 exec_lo, exec_lo ; encoding: [0x7e,0x09,0xfe,0xbe]
	; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]			; GFX10-NEXT: s_and_b32 exec_lo, exec_lo, s12 ; encoding: [0x7e,0x0c,0x7e,0x87]
	; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0xa dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0a,0x80,0xf0,0x00,0x00,0x40,0x00]			; GFX10-NEXT: image_sample v[0:1], v0, s[0:7], s[8:11] dmask:0xa dim:SQ_RSRC_IMG_1D ; encoding: [0x00,0x0a,0x80,0xf0,0x00,0x00,0x40,0x00]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 11, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%r = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 11, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 1, i32 2>			%out = shufflevector <4 x float> %r, <4 x float> undef, <2 x i32> <i32 1, i32 2>
	ret <2 x float> %out			ret <2 x float> %out
	}			}

	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ps.live.ll

Show All 13 Lines	define amdgpu_ps float @test1() #0 {
%live = call i1 @llvm.amdgcn.ps.live()		%live = call i1 @llvm.amdgcn.ps.live()
%live.32 = zext i1 %live to i32		%live.32 = zext i1 %live to i32
%r = bitcast i32 %live.32 to float		%r = bitcast i32 %live.32 to float
ret float %r		ret float %r
}		}

; CHECK-LABEL: {{^}}test2:		; CHECK-LABEL: {{^}}test2:
; CHECK: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec		; CHECK: s_mov_b64 [[LIVE:s\[[0-9]+:[0-9]+\]]], exec
		; Following copy should go away:
		; CHECK: s_mov_b64 [[COPY:s\[[0-9]+:[0-9]+\]]], [[LIVE]]
; CHECK-DAG: s_wqm_b64 exec, exec		; CHECK-DAG: s_wqm_b64 exec, exec
; CHECK-DAG: v_cndmask_b32_e64 [[VAR:v[0-9]+]], 0, 1, [[LIVE]]		; CHECK-DAG: v_cndmask_b32_e64 [[VAR:v[0-9]+]], 0, 1, [[COPY]]
; CHECK: image_sample v0, [[VAR]],		; CHECK: image_sample v0, [[VAR]],
define amdgpu_ps float @test2() #0 {		define amdgpu_ps float @test2() #0 {
%live = call i1 @llvm.amdgcn.ps.live()		%live = call i1 @llvm.amdgcn.ps.live()
%live.32 = zext i1 %live to i32		%live.32 = zext i1 %live to i32
%live.32.bc = bitcast i32 %live.32 to float		%live.32.bc = bitcast i32 %live.32 to float
%t = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %live.32.bc, <8 x i32> undef, <4 x i32> undef, i1 0, i32 0, i32 0)		%t = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %live.32.bc, <8 x i32> undef, <4 x i32> undef, i1 0, i32 0, i32 0)
%r = extractelement <4 x float> %t, i32 0		%r = extractelement <4 x float> %t, i32 0
ret float %r		ret float %r
Show All 31 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.softwqm.ll

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	; Make sure the transition from WQM to Exact to softwqm does trigger WQM.			; Make sure the transition from WQM to Exact to softwqm does trigger WQM.
	;			;
	;CHECK-LABEL: {{^}}test_softwqm2:			;CHECK-LABEL: {{^}}test_softwqm2:
	;CHECK: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec			;CHECK: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	;CHECK: s_wqm_b64 exec, exec			;CHECK: s_wqm_b64 exec, exec
	;CHECK: buffer_load_dword			;CHECK: buffer_load_dword
	;CHECK: buffer_load_dword			;CHECK: buffer_load_dword
				;CHECK: v_add_f32_e32
				;CHECK: v_add_f32_e32
	;CHECK: s_and_b64 exec, exec, [[ORIG]]			;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: buffer_store_dword			;CHECK: buffer_store_dword
	;CHECK; s_wqm_b64 exec, exec
	;CHECK: v_add_f32_e32
	define amdgpu_ps float @test_softwqm2(i32 inreg %idx0, i32 inreg %idx1) {			define amdgpu_ps float @test_softwqm2(i32 inreg %idx0, i32 inreg %idx1) {
	main_body:			main_body:
	%src0 = call float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32> undef, i32 %idx0, i32 0, i32 0, i32 0)			%src0 = call float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32> undef, i32 %idx0, i32 0, i32 0, i32 0)
	%src1 = call float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32> undef, i32 %idx1, i32 0, i32 0, i32 0)			%src1 = call float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32> undef, i32 %idx1, i32 0, i32 0, i32 0)
	%temp = fadd float %src0, %src1			%temp = fadd float %src0, %src1
	%temp.0 = call float @llvm.amdgcn.wqm.f32(float %temp)			%temp.0 = call float @llvm.amdgcn.wqm.f32(float %temp)
	call void @llvm.amdgcn.struct.buffer.store.f32(float %temp.0, <4 x i32> undef, i32 %idx0, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.struct.buffer.store.f32(float %temp.0, <4 x i32> undef, i32 %idx0, i32 0, i32 0, i32 0)
	%out = fadd float %temp, %temp			%out = fadd float %temp, %temp
	%out.0 = call float @llvm.amdgcn.softwqm.f32(float %out)			%out.0 = call float @llvm.amdgcn.softwqm.f32(float %out)
	ret float %out.0			ret float %out.0
	}			}

	; Make sure the transition from Exact to WWM then softwqm does not trigger WQM.			; Make sure the transition from Exact to WWM then softwqm does not trigger WQM.
	;			;
	;CHECK-LABEL: {{^}}test_wwm1:			;CHECK-LABEL: {{^}}test_wwm1:
				;CHECK: s_or_saveexec_b64 [[ORIG0:s\[[0-9]+:[0-9]+\]]], -1
	;CHECK: buffer_load_dword			;CHECK: buffer_load_dword
				;CHECK: s_mov_b64 exec, [[ORIG0]]
	;CHECK: buffer_store_dword			;CHECK: buffer_store_dword
	;CHECK: s_or_saveexec_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], -1			;CHECK: s_or_saveexec_b64 [[ORIG1:s\[[0-9]+:[0-9]+\]]], -1
	;CHECK: buffer_load_dword			;CHECK: buffer_load_dword
	;CHECK: v_add_f32_e32			;CHECK: v_add_f32_e32
	;CHECK: s_mov_b64 exec, [[ORIG]]			;CHECK: s_mov_b64 exec, [[ORIG1]]
	;CHECK-NOT: s_wqm_b64			;CHECK-NOT: s_wqm_b64
	define amdgpu_ps float @test_wwm1(i32 inreg %idx0, i32 inreg %idx1) {			define amdgpu_ps float @test_wwm1(i32 inreg %idx0, i32 inreg %idx1) {
	main_body:			main_body:
	%src0 = call float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32> undef, i32 %idx0, i32 0, i32 0, i32 0)			%src0 = call float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32> undef, i32 %idx0, i32 0, i32 0, i32 0)
	call void @llvm.amdgcn.struct.buffer.store.f32(float %src0, <4 x i32> undef, i32 %idx0, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.struct.buffer.store.f32(float %src0, <4 x i32> undef, i32 %idx0, i32 0, i32 0, i32 0)
	%src1 = call float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32> undef, i32 %idx1, i32 0, i32 0, i32 0)			%src1 = call float @llvm.amdgcn.struct.buffer.load.f32(<4 x i32> undef, i32 %idx1, i32 0, i32 0, i32 0)
	%temp = fadd float %src0, %src1			%temp = fadd float %src0, %src1
	%temp.0 = call float @llvm.amdgcn.wwm.f32(float %temp)			%temp.0 = call float @llvm.amdgcn.wwm.f32(float %temp)
	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wqm.ll

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines

	; Check that WQM is re-enabled when required.			; Check that WQM is re-enabled when required.
	;			;
	;CHECK-LABEL: {{^}}test4:			;CHECK-LABEL: {{^}}test4:
	;CHECK-NEXT: ; %main_body			;CHECK-NEXT: ; %main_body
	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec			;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	;CHECK-NEXT: s_wqm_b64 exec, exec			;CHECK-NEXT: s_wqm_b64 exec, exec
	;CHECK: v_mul_lo_u32 [[MUL:v[0-9]+]], v0, v1			;CHECK: v_mul_lo_u32 [[MUL:v[0-9]+]], v0, v1
	;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: store
	;CHECK: s_wqm_b64 exec, exec
	;CHECK: image_sample			;CHECK: image_sample
				;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: image_sample			;CHECK: image_sample
				;CHECK: store
	define amdgpu_ps <4 x float> @test4(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %c, i32 %d, float %data) {			define amdgpu_ps <4 x float> @test4(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, i32 %c, i32 %d, float %data) {
	main_body:			main_body:
	%c.1 = mul i32 %c, %d			%c.1 = mul i32 %c, %d

	call void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float> undef, <4 x i32> undef, i32 %c.1, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.struct.buffer.store.v4f32(<4 x float> undef, <4 x i32> undef, i32 %c.1, i32 0, i32 0, i32 0)
	%c.1.bc = bitcast i32 %c.1 to float			%c.1.bc = bitcast i32 %c.1 to float
	%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %c.1.bc, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0			%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %c.1.bc, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0
	%tex0 = extractelement <4 x float> %tex, i32 0			%tex0 = extractelement <4 x float> %tex, i32 0
	▲ Show 20 Lines • Show All 464 Lines • ▼ Show 20 Lines
	;CHECK-NEXT: ; %main_body			;CHECK-NEXT: ; %main_body
	;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec			;CHECK-NEXT: s_mov_b64 [[ORIG:s\[[0-9]+:[0-9]+\]]], exec
	;CHECK-NEXT: s_wqm_b64 exec, exec			;CHECK-NEXT: s_wqm_b64 exec, exec
	;CHECK: s_and_b64 exec, exec, [[ORIG]]			;CHECK: s_and_b64 exec, exec, [[ORIG]]
	;CHECK: image_sample			;CHECK: image_sample
	;CHECK: buffer_store_dword			;CHECK: buffer_store_dword
	;CHECK: s_wqm_b64 exec, exec			;CHECK: s_wqm_b64 exec, exec
	;CHECK: v_cmpx_			;CHECK: v_cmpx_
	;CHECK: s_and_saveexec_b64 [[SAVE:s\[[0-9]+:[0-9]+\]]], [[ORIG]]
	;CHECK: buffer_store_dword
	;CHECK: s_mov_b64 exec, [[SAVE]]
	;CHECK: image_sample			;CHECK: image_sample
				;CHECK: s_and_b64 exec, exec, [[ORIG]]
				;CHECK: image_sample
				;CHECK: buffer_store_dword
	define amdgpu_ps <4 x float> @test_kill_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <2 x i32> %idx, <2 x float> %data, float %coord, float %coord2, float %z) {			define amdgpu_ps <4 x float> @test_kill_0(<8 x i32> inreg %rsrc, <4 x i32> inreg %sampler, float addrspace(1)* inreg %ptr, <2 x i32> %idx, <2 x float> %data, float %coord, float %coord2, float %z) {
	main_body:			main_body:
	%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0			%tex = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %coord, <8 x i32> %rsrc, <4 x i32> %sampler, i1 false, i32 0, i32 0) #0
	%idx.0 = extractelement <2 x i32> %idx, i32 0			%idx.0 = extractelement <2 x i32> %idx, i32 0
	%data.0 = extractelement <2 x float> %data, i32 0			%data.0 = extractelement <2 x float> %data, i32 0
	call void @llvm.amdgcn.struct.buffer.store.f32(float %data.0, <4 x i32> undef, i32 %idx.0, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.struct.buffer.store.f32(float %data.0, <4 x i32> undef, i32 %idx.0, i32 0, i32 0, i32 0)

	%z.cmp = fcmp olt float %z, 0.0			%z.cmp = fcmp olt float %z, 0.0
	▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved.ll

	; RUN: llc -O0 -march=amdgcn -mcpu=gfx900 -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9,GFX9-O0 %s			; RUN: llc -O0 -march=amdgcn -mcpu=gfx900 -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9,GFX9-O0 %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9,GFX9-O3 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-dpp-combine=false -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9,GFX9-O3 %s

	; GFX9-LABEL: {{^}}no_cfg:			; GFX9-LABEL: {{^}}no_cfg:
	define amdgpu_cs void @no_cfg(<4 x i32> inreg %tmp14) {			define amdgpu_cs void @no_cfg(<4 x i32> inreg %tmp14) {
	%tmp100 = call <2 x float> @llvm.amdgcn.raw.buffer.load.v2f32(<4 x i32> %tmp14, i32 0, i32 0, i32 0)			%tmp100 = call <2 x float> @llvm.amdgcn.raw.buffer.load.v2f32(<4 x i32> %tmp14, i32 0, i32 0, i32 0)
	%tmp101 = bitcast <2 x float> %tmp100 to <2 x i32>			%tmp101 = bitcast <2 x float> %tmp100 to <2 x i32>
	%tmp102 = extractelement <2 x i32> %tmp101, i32 0			%tmp102 = extractelement <2 x i32> %tmp101, i32 0
	%tmp103 = extractelement <2 x i32> %tmp101, i32 1			%tmp103 = extractelement <2 x i32> %tmp101, i32 1
	%tmp105 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp102, i32 0)			%tmp105 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp102, i32 0)
	%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp103, i32 0)			%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %tmp103, i32 0)

	; GFX9: v_mov_b32_dpp v[[FIRST_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9: s_or_saveexec_b64 s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, -1
	; GFX9: v_add_u32_e32 v[[FIRST_ADD:[0-9]+]], v{{[0-9]+}}, v[[FIRST_MOV]]
	; GFX9: v_mov_b32_e32 v[[FIRST:[0-9]+]], v[[FIRST_ADD]]			; GFX9-DAG: v_mov_b32_dpp v[[FIRST_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
				; GFX9-DAG: v_add_u32_e32 v[[FIRST_ADD:[0-9]+]], v{{[0-9]+}}, v[[FIRST_MOV]]
				; GFX9-DAG: v_mov_b32_e32 v[[FIRST:[0-9]+]], v[[FIRST_ADD]]
	%tmp120 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp105, i32 323, i32 12, i32 15, i1 false)			%tmp120 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp105, i32 323, i32 12, i32 15, i1 false)
	%tmp121 = add i32 %tmp105, %tmp120			%tmp121 = add i32 %tmp105, %tmp120
	%tmp122 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp121)			%tmp122 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp121)

	; GFX9: v_mov_b32_dpp v[[SECOND_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf			; GFX9-DAG: v_mov_b32_dpp v[[SECOND_MOV:[0-9]+]], v{{[0-9]+}} row_bcast:31 row_mask:0xc bank_mask:0xf
	; GFX9: v_add_u32_e32 v[[SECOND_ADD:[0-9]+]], v{{[0-9]+}}, v[[SECOND_MOV]]			; GFX9-DAG: v_add_u32_e32 v[[SECOND_ADD:[0-9]+]], v{{[0-9]+}}, v[[SECOND_MOV]]
	; GFX9: v_mov_b32_e32 v[[SECOND:[0-9]+]], v[[SECOND_ADD]]			; GFX9-DAG: v_mov_b32_e32 v[[SECOND:[0-9]+]], v[[SECOND_ADD]]
	%tmp135 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp107, i32 323, i32 12, i32 15, i1 false)			%tmp135 = tail call i32 @llvm.amdgcn.update.dpp.i32(i32 0, i32 %tmp107, i32 323, i32 12, i32 15, i1 false)
	%tmp136 = add i32 %tmp107, %tmp135			%tmp136 = add i32 %tmp107, %tmp135
	%tmp137 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp136)			%tmp137 = tail call i32 @llvm.amdgcn.wwm.i32(i32 %tmp136)

	; GFX9-O3: v_cmp_eq_u32_e32 vcc, v[[FIRST]], v[[SECOND]]			; GFX9-O3: v_cmp_eq_u32_e32 vcc, v[[FIRST]], v[[SECOND]]
	; GFX9-O0: v_cmp_eq_u32_e64 s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, v[[FIRST]], v[[SECOND]]			; GFX9-O0: v_cmp_eq_u32_e64 s{{\[}}{{[0-9]+}}:{{[0-9]+}}{{\]}}, v[[FIRST]], v[[SECOND]]
	%tmp138 = icmp eq i32 %tmp122, %tmp137			%tmp138 = icmp eq i32 %tmp122, %tmp137
	%tmp139 = sext i1 %tmp138 to i32			%tmp139 = sext i1 %tmp138 to i32
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GFX9-LABEL: {{^}}call:			; GFX9-LABEL: {{^}}call:
	define amdgpu_kernel void @call(<4 x i32> inreg %tmp14, i32 inreg %arg) {			define amdgpu_kernel void @call(<4 x i32> inreg %tmp14, i32 inreg %arg) {
	; GFX9-DAG: s_load_dword [[ARG:s[0-9]+]]			; GFX9-DAG: s_load_dword [[ARG:s[0-9]+]]
	; GFX9-O0-DAG: s_mov_b32 s0, 0{{$}}			; GFX9-O0-DAG: s_mov_b32 s0, 0{{$}}
	; GFX9-O0-DAG: v_mov_b32_e32 v0, [[ARG]]			; GFX9-O0-DAG: v_mov_b32_e32 v0, [[ARG]]

	; GFX9-O3: v_mov_b32_e32 v2, [[ARG]]			; GFX9-O3: v_mov_b32_e32 v2, [[ARG]]


	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	; GFX9-O0-NEXT: v_mov_b32_e32 v0, s0			; GFX9-O0-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0			; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: s_not_b64 exec, exec			; GFX9-NEXT: s_not_b64 exec, exec
	%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)			%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)
	; GFX9-O0: buffer_store_dword v0			; GFX9-O0: buffer_store_dword v0
	; GFX9-O3: v_mov_b32_e32 v0, v2			; GFX9-O3: v_mov_b32_e32 v0, v2
	; GFX9: s_swappc_b64			; GFX9: s_swappc_b64
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Move WQM Pass after MI SchedulerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 300853

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/SIWholeQuadMode.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.a16.dim.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.dim.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.o.dim.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_global_pointer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_raw_buffer.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_struct_buffer.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ps.live.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.softwqm.ll

llvm/test/CodeGen/AMDGPU/wqm.ll

llvm/test/CodeGen/AMDGPU/wwm-reserved.ll

[AMDGPU] Move WQM Pass after MI Scheduler
ClosedPublic