This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
119	Just break the scan at 64 and restart. Also needed test for this.
125	Wrap the check into GCNSubtarget::hasHardClauses()? Also add skipFunction() check.

Harbormaster failed remote builds in B56449: Diff 263456!May 12 2020, 10:44 AM

Harbormaster failed remote builds in B56448: Diff 263455!

rampitec added inline comments.May 12 2020, 10:44 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1048	Need to check OptLevel probably, the pass is optional.

Address review comments.

foad marked 4 inline comments as done.May 12 2020, 12:55 PM

foad added inline comments.

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
119	OK, but my way was simpler. I tried to add a test (hard-clauses.mir) but this pass is guided by the SIInstrInfo::shouldClusterMemOps heuristic, which never clusters that many loads. I still think this pass should handle it correctly, in case the heuristic ever changes.

arsenm added inline comments.May 12 2020, 12:59 PM

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
117–119	Do we need to bundle the claused instructions? What prevents inserting something else between these?
153	Probably should use a named constant for the limit

rampitec added inline comments.May 12 2020, 1:08 PM

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
117–119	Looking at the passes which work after this it does not seem anything will be modified here. But you are probably right, just to be on a safer side.
119	This way you will restart new clause after the break, so it is better. We may also want to add an option to shouldClusterMemOps() controlling a maximum cluster for this test. This option will be useful anyway for clustering experiments.

foad marked 2 inline comments as done.May 12 2020, 1:54 PM

foad added inline comments.

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
117–119	I don't know. I haven't noticed any problems in testing. I guess it just relies on running late enough in the pipeline.

Harbormaster failed remote builds in B56473: Diff 263498!May 12 2020, 2:00 PM

Actually depending on shouldCluster is a problem. First soft clauses are formed and then you turn them into hard clauses. Soft clauses impose higher register pressure but then you simply break them here, essentially wasting registers. It is either both passes should be guided by the same heuristic or none.

In D79792#2033138, @rampitec wrote:

Actually depending on shouldCluster is a problem. First soft clauses are formed and then you turn them into hard clauses. Soft clauses impose higher register pressure but then you simply break them here, essentially wasting registers. It is either both passes should be guided by the same heuristic or none.

I don't understand what you're suggesting. In this pass we have to impose a limit of 64 instructions for correctness, not performance, because that's how the s_clause instruction's operand is encoded.

Perhaps we should teach shouldCluster not to bother clausing more than 64 loads, because the s_clause won't be able to honor it, but it already has a much lower limit than that, so I don't think there's any need to change it.

Rebase. Use LAST_REAL_HARDCLAUSE_TYPE.

In D79792#2033282, @foad wrote:

In D79792#2033138, @rampitec wrote:

Actually depending on shouldCluster is a problem. First soft clauses are formed and then you turn them into hard clauses. Soft clauses impose higher register pressure but then you simply break them here, essentially wasting registers. It is either both passes should be guided by the same heuristic or none.

I don't understand what you're suggesting. In this pass we have to impose a limit of 64 instructions for correctness, not performance, because that's how the s_clause instruction's operand is encoded.

Perhaps we should teach shouldCluster not to bother clausing more than 64 loads, because the s_clause won't be able to honor it, but it already has a much lower limit than that, so I don't think there's any need to change it.

In fact it seems clauses are broken every 4 instructions. Which means there is no reason to form up to 15 instructions into a soft clause which will then be broken here. So what I propose: either do not call shallClustetMemOps here, or also call it in SIFormMemoryClauses. I do no really have a preference here, but disagreement results in registers used by soft clauses were used for nothing. Does it make sense? I.e. consider you also have xnack enabled.

JBTW what happens to counters if hard clause exceeds 16 instructions? Every instruction increases the counter and there are only 4 bits available. Do I get it wrong?

Harbormaster failed remote builds in B56555: Diff 263630!May 13 2020, 1:34 AM

In D79792#2033325, @rampitec wrote:

In D79792#2033282, @foad wrote:

In D79792#2033138, @rampitec wrote:

Actually depending on shouldCluster is a problem. First soft clauses are formed and then you turn them into hard clauses. Soft clauses impose higher register pressure but then you simply break them here, essentially wasting registers. It is either both passes should be guided by the same heuristic or none.

I don't understand what you're suggesting. In this pass we have to impose a limit of 64 instructions for correctness, not performance, because that's how the s_clause instruction's operand is encoded.

Perhaps we should teach shouldCluster not to bother clausing more than 64 loads, because the s_clause won't be able to honor it, but it already has a much lower limit than that, so I don't think there's any need to change it.

In fact it seems clauses are broken every 4 instructions. Which means there is no reason to form up to 15 instructions into a soft clause which will then be broken here. So what I propose: either do not call shallClustetMemOps here, or also call it in SIFormMemoryClauses. I do no really have a preference here, but disagreement results in registers used by soft clauses were used for nothing. Does it make sense? I.e. consider you also have xnack enabled.

Using gfx10 terminology: SIFormMemoryClauses deals with restartable groups but SIInsertHardClauses deals with hard clauses. They are similar but not the same. That's what I tried to explain in the big comment at the top of SIInsertHardClauses.cpp. Hard clauses are all about performance. Groups are all about correctness in the presence of XNACK.

shouldClusterMemOps is a heuristic to decide whether loads should be claused for performance. We have to call it from SIInsertHardClauses, so it can tell us not to bother clausing loads that have a different base address.

I'm not at all sure that calling shouldClusterMemOps from SIFormMemoryClauses is a good idea, but in any case I think that should be a separate patch with its own discussion.

JBTW what happens to counters if hard clause exceeds 16 instructions? Every instruction increases the counter and there are only 4 bits available. Do I get it wrong?

I don't know, sorry. Maybe it's one of the conditions that can cause the hardware to break a hard clause?

foad marked an inline comment as done.May 13 2020, 2:41 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
117–119	Would bundling actually fix the problem? What about other code that modifies bundles, like GCNHazardRecognizer::insertNoopInBundle? It feels like there's an arms race where one side invents mechanisms like bundling to stop instructions being inserted, and the other side goes and inserts instructions in the bundles anyway. Having said that, I do have some patches in progress to remove GCNHazardRecognizer::insertNoopInBundle. But I don't know if there are any other places where bundles could be modified.

arsenm added inline comments.May 13 2020, 7:44 AM

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
117–119	I think the hazard recognizer is the only place with a potentially legitimate reason to try inserting anything else in a bundle

Bundle the clauses.

Harbormaster failed remote builds in B56596: Diff 263725!May 13 2020, 9:11 AM

arsenm added inline comments.May 13 2020, 9:15 AM

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
115	Start with lowerccase, and should use SIInstrInfo
124–132	You didn't finalize the bundle, so the register operands on the BUNDLE are missing

arsenm added inline comments.May 13 2020, 9:16 AM

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp
124–132	Could also use MIBundleBuilder

In D79792#2033414, @foad wrote:

Using gfx10 terminology: SIFormMemoryClauses deals with restartable groups but SIInsertHardClauses deals with hard clauses. They are similar but not the same. That's what I tried to explain in the big comment at the top of SIInsertHardClauses.cpp. Hard clauses are all about performance. Groups are all about correctness in the presence of XNACK.

SIFormMemoryClauses is an optimization pass. If it does not run nothing breaks, just clauses will be broken by the hazard recognizer. So in fact it does the same thing.

shouldClusterMemOps is a heuristic to decide whether loads should be claused for performance. We have to call it from SIInsertHardClauses, so it can tell us not to bother clausing loads that have a different base address.

Actually a very small limit set by shouldClusterMemOps is driven by the register pressure. There seems to be no reason to call it here at all because this is past RA.
Note that SIFormMemoryClauses does not call it but uses RP tracker to maintain occupancy instead.

Finalize bundles.

foad marked 5 inline comments as done.May 13 2020, 11:36 AM

Harbormaster completed remote builds in B56637: Diff 263804.May 13 2020, 2:11 PM

In D79792#2034306, @rampitec wrote:

shouldClusterMemOps is a heuristic to decide whether loads should be claused for performance. We have to call it from SIInsertHardClauses, so it can tell us not to bother clausing loads that have a different base address.

Actually a very small limit set by shouldClusterMemOps is driven by the register pressure. There seems to be no reason to call it here at all because this is past RA.
Note that SIFormMemoryClauses does not call it but uses RP tracker to maintain occupancy instead.

shouldClusterMemOps decides whether it is beneficial for performance to cluster two loads. For example: two VMEM instructions with the same base address should be clustered, unless one has a sampler and the other doesn't, etc etc. All that sort of logic belongs in shouldClusterMemOps, and we have to call it here because we don't want to insert s_clause instructions if it is not going to improve performance.

Yes, shouldClusterMemOps also imposes a limit on the length of the cluster. If that is *only* to help with register pressure, then perhaps I can bypass that check by always calling it with NumLoads=2 instead of NumLoads=CI.Size+1. What do you think?

In D79792#2035964, @foad wrote:

Yes, shouldClusterMemOps also imposes a limit on the length of the cluster. If that is *only* to help with register pressure, then perhaps I can bypass that check by always calling it with NumLoads=2 instead of NumLoads=CI.Size+1. What do you think?

That's actually a very good idea. Just add a comment explaining it!

Don't let shouldClusterMemOps limit clause size.

LGTM

This revision is now accepted and ready to land.May 14 2020, 10:44 AM

Harbormaster failed remote builds in B56747: Diff 264019!May 14 2020, 10:50 AM

Closed by commit rG42a556050346: [AMDGPU] New SIInsertHardClauses pass (authored by foad). · Explain WhyMay 14 2020, 11:24 AM

This revision was automatically updated to reflect the committed changes.

This pretty much broke the world with radeonsi on Navi 14:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fc20abf155b in __GI_abort () at abort.c:79
#2  0x00007fc20abf142f in __assert_fail_base
    (fmt=0x7fc20ad57b48 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7fc1ecec6cf6 "CI.Length == std::distance(CI.First->getIterator(), CI.Last->getIterator()) + 1", file=0x7fc1ec73aa57 "/home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp", line=114, function=<optimized out>) at assert.c:92
#3  0x00007fc20ac00092 in __GI___assert_fail
    (assertion=0x7fc1ecec6cf6 "CI.Length == std::distance(CI.First->getIterator(), CI.Last->getIterator()) + 1", file=0x7fc1ec73aa57 "/home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp", line=114, function=0x7fc1ecf1e303 "bool (anonymous namespace)::SIInsertHardClauses::emitClause(const (anonymous namespace)::SIInsertHardClauses::ClauseInfo &, const llvm::SIInstrInfo *)") at assert.c:101
#4  0x00007fc1ef7bfb58 in (anonymous namespace)::SIInsertHardClauses::emitClause((anonymous namespace)::SIInsertHardClauses::ClauseInfo const&, llvm::SIInstrInfo const*) (this=<optimized out>, CI=..., SII=<optimized out>)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:113
#5  0x00007fc1ef7bf6c7 in (anonymous namespace)::SIInsertHardClauses::runOnMachineFunction(llvm::MachineFunction&) (this=<optimized out>, MF=...)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:167
#6  0x00007fc1ee1abcad in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (this=0x7fc1c00458f0, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#7  0x00007fc1edf953a5 in llvm::FPPassManager::runOnFunction(llvm::Function&) (this=<optimized out>, F=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1482
#8  0x00007fc1eefa2493 in (anonymous namespace)::CGPassManager::RunPassOnSCC(llvm::Pass*, llvm::CallGraphSCC&, llvm::CallGraph&, bool&, bool&)
    (this=0x7fc1c0024970, P=0x7fc1c0025020, CurSCC=..., CG=..., CallGraphUpToDate=<optimized out>, DevirtualizedCall=<optimized out>) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:176
#9  (anonymous namespace)::CGPassManager::RunAllPassesOnSCC(llvm::CallGraphSCC&, llvm::CallGraph&, bool&) (this=0x7fc1c0024970, CurSCC=..., CG=..., DevirtualizedCall=<optimized out>)
    at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:441
#10 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/Analysis/CallGraphSCCPass.cpp:497
#11 0x00007fc1edf95c83 in (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) (this=<optimized out>, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1583
#12 llvm::legacy::PassManagerImpl::run(llvm::Module&) (this=0x7fc1c0009670, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1695
#13 0x00007fc1edf9613e in llvm::legacy::PassManager::run(llvm::Module&) (this=<optimized out>, this@entry=0x7fc1c0009650, M=...) at /home/daenzer/src/llvm-git/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1726
#14 0x00007fc1f788eb9f in ac_compile_module_to_elf(ac_compiler_passes*, LLVMModuleRef, char**, size_t*) (p=p@entry=
    0x7fc1c0009610, module=<optimized out>, pelf_buffer=pelf_buffer@entry=0x563be208bc20, pelf_size=pelf_size@entry=0x563be208bc28) at /home/daenzer/src/llvm-git/llvm-project/llvm/include/llvm/IR/Module.h:892
#15 0x00007fc1f77b6c51 in si_compile_llvm
    (sscreen=sscreen@entry=0x563be1f066a0, binary=binary@entry=0x563be208bc20, conf=conf@entry=0x563be208bc38, compiler=compiler@entry=0x563be1f06f30, ac=ac@entry=0x7fc1e8f5e240, debug=debug@entry=0x563be208b5b0, shader_type=PIPE_SHADER_COMPUTE, name=0x7fc1f7aab177 "Compute Shader", less_optimized=false) at ../src/gallium/drivers/radeonsi/si_shader_llvm.c:104
#16 0x00007fc1f77b4103 in si_llvm_compile_shader (sscreen=sscreen@entry=0x563be1f066a0, compiler=compiler@entry=0x563be1f06f30, shader=shader@entry=0x563be208bb68, debug=debug@entry=0x563be208b5b0, nir=<optimized out>, 
    nir@entry=0x563be200e1b0, free_nir=<optimized out>) at ../src/gallium/drivers/radeonsi/si_shader.c:1581
#17 0x00007fc1f77b561d in si_compile_shader (sscreen=0x563be1f066a0, compiler=0x563be1f06f30, shader=<optimized out>, debug=0x563be208b5b0) at ../src/gallium/drivers/radeonsi/si_shader.c:1855
#18 0x00007fc1f77b64b7 in si_create_shader_variant (sscreen=sscreen@entry=0x563be1f066a0, compiler=compiler@entry=0x563be1f06f30, shader=shader@entry=0x563be208bb68, debug=debug@entry=0x563be208b5b0)
    at ../src/gallium/drivers/radeonsi/si_shader.c:2384
#19 0x00007fc1f78126e1 in si_create_compute_state_async (job=job@entry=0x563be208b580, thread_index=thread_index@entry=3) at ../src/gallium/drivers/radeonsi/si_compute.c:161
#20 0x00007fc1f72322f1 in util_queue_thread_func (input=input@entry=0x563be1e96bd0) at ../src/util/u_queue.c:308
#21 0x00007fc1f7231d48 in impl_thrd_routine (p=<optimized out>) at ../include/c11/threads_posix.h:87
#22 0x00007fc208cc8f27 in start_thread (arg=<optimized out>) at pthread_create.c:479
#23 0x00007fc20acc931f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

In D79792#2038431, @michel.daenzer wrote:

This pretty much broke the world with radeonsi on Navi 14:

Sorry. Can you please try D80007.

tmatheson mentioned this in D94949: [AArch64][RegAllocFast] Add findSpillBefore to TargetRegisterInfo.Mar 17 2021, 5:49 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.h

3 lines

AMDGPUSubtarget.h

2 lines

AMDGPUTargetMachine.cpp

3 lines

CMakeLists.txt

1 line

SIInsertHardClauses.cpp

200 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.amdgcn.div.fmas.ll

18 lines

llvm.amdgcn.div.scale.ll

10 lines

llvm.amdgcn.end.cf.i32.ll

1 line

llvm.amdgcn.if.break.i32.ll

1 line

llvm.amdgcn.mov.dpp.ll

1 line

llvm.amdgcn.update.dpp.ll

1 line

atomic_optimizations_local_pointer.ll

4 lines

178 lines

19 lines

3 lines

4 lines

3 lines

4 lines

llvm.amdgcn.raw.buffer.load.ll

6 lines

shrink-add-sub-constant.ll

2 lines

smrd.ll

2 lines

vgpr-descriptor-waterfall-loop-idom-update.ll

1 line

vgpr-tuple-allocation.ll

20 lines

Diff 264042

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines
	extern char &SIAnnotateControlFlowPassID;			extern char &SIAnnotateControlFlowPassID;

	void initializeSIMemoryLegalizerPass(PassRegistry&);			void initializeSIMemoryLegalizerPass(PassRegistry&);
	extern char &SIMemoryLegalizerID;			extern char &SIMemoryLegalizerID;

	void initializeSIModeRegisterPass(PassRegistry&);			void initializeSIModeRegisterPass(PassRegistry&);
	extern char &SIModeRegisterID;			extern char &SIModeRegisterID;

				void initializeSIInsertHardClausesPass(PassRegistry &);
				extern char &SIInsertHardClausesID;

	void initializeSIInsertWaitcntsPass(PassRegistry&);			void initializeSIInsertWaitcntsPass(PassRegistry&);
	extern char &SIInsertWaitcntsID;			extern char &SIInsertWaitcntsID;

	void initializeSIFormMemoryClausesPass(PassRegistry&);			void initializeSIFormMemoryClausesPass(PassRegistry&);
	extern char &SIFormMemoryClausesID;			extern char &SIFormMemoryClausesID;

	void initializeSIPostRABundlerPass(PassRegistry&);			void initializeSIPostRABundlerPass(PassRegistry&);
	extern char &SIPostRABundlerID;			extern char &SIPostRABundlerID;
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 1,032 Lines • ▼ Show 20 Lines	public:
bool hasLdsBranchVmemWARHazard() const {		bool hasLdsBranchVmemWARHazard() const {
return HasLdsBranchVmemWARHazard;		return HasLdsBranchVmemWARHazard;
}		}

bool hasNSAtoVMEMBug() const {		bool hasNSAtoVMEMBug() const {
return HasNSAtoVMEMBug;		return HasNSAtoVMEMBug;
}		}

		bool hasHardClauses() const { return getGeneration() >= GFX10; }

/// Return the maximum number of waves per SIMD for kernels using \p SGPRs		/// Return the maximum number of waves per SIMD for kernels using \p SGPRs
/// SGPRs		/// SGPRs
unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;		unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;

/// Return the maximum number of waves per SIMD for kernels using \p VGPRs		/// Return the maximum number of waves per SIMD for kernels using \p VGPRs
/// VGPRs		/// VGPRs
unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;		unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;

▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeAMDGPUPreLegalizerCombinerPass(*PR);		initializeAMDGPUPreLegalizerCombinerPass(*PR);
initializeAMDGPUPromoteAllocaPass(*PR);		initializeAMDGPUPromoteAllocaPass(*PR);
initializeAMDGPUCodeGenPreparePass(*PR);		initializeAMDGPUCodeGenPreparePass(*PR);
initializeAMDGPUPropagateAttributesEarlyPass(*PR);		initializeAMDGPUPropagateAttributesEarlyPass(*PR);
initializeAMDGPUPropagateAttributesLatePass(*PR);		initializeAMDGPUPropagateAttributesLatePass(*PR);
initializeAMDGPURewriteOutArgumentsPass(*PR);		initializeAMDGPURewriteOutArgumentsPass(*PR);
initializeAMDGPUUnifyMetadataPass(*PR);		initializeAMDGPUUnifyMetadataPass(*PR);
initializeSIAnnotateControlFlowPass(*PR);		initializeSIAnnotateControlFlowPass(*PR);
		initializeSIInsertHardClausesPass(*PR);
initializeSIInsertWaitcntsPass(*PR);		initializeSIInsertWaitcntsPass(*PR);
initializeSIModeRegisterPass(*PR);		initializeSIModeRegisterPass(*PR);
initializeSIWholeQuadModePass(*PR);		initializeSIWholeQuadModePass(*PR);
initializeSILowerControlFlowPass(*PR);		initializeSILowerControlFlowPass(*PR);
initializeSIRemoveShortExecBranchesPass(*PR);		initializeSIRemoveShortExecBranchesPass(*PR);
initializeSIPreEmitPeepholePass(*PR);		initializeSIPreEmitPeepholePass(*PR);
initializeSIInsertSkipsPass(*PR);		initializeSIInsertSkipsPass(*PR);
initializeSIMemoryLegalizerPass(*PR);		initializeSIMemoryLegalizerPass(*PR);
▲ Show 20 Lines • Show All 787 Lines • ▼ Show 20 Lines	void GCNPassConfig::addPreEmitPass() {
// instructions were emitted directly before it.		// instructions were emitted directly before it.
//		//
// Here we add a stand-alone hazard recognizer pass which can handle all		// Here we add a stand-alone hazard recognizer pass which can handle all
// cases.		// cases.
//		//
// FIXME: This stand-alone pass will emit indiv. S_NOP 0, as needed. It would		// FIXME: This stand-alone pass will emit indiv. S_NOP 0, as needed. It would
// be better for it to emit S_NOP <N> when possible.		// be better for it to emit S_NOP <N> when possible.
addPass(&PostRAHazardRecognizerID);		addPass(&PostRAHazardRecognizerID);
		if (getOptLevel() > CodeGenOpt::None)
		rampitecUnsubmitted Done Reply Inline Actions Need to check OptLevel probably, the pass is optional. rampitec: Need to check OptLevel probably, the pass is optional.
		addPass(&SIInsertHardClausesID);

addPass(&SIRemoveShortExecBranchesID);		addPass(&SIRemoveShortExecBranchesID);
addPass(&SIPreEmitPeepholeID);		addPass(&SIPreEmitPeepholeID);
addPass(&SIInsertSkipsPassID);		addPass(&SIInsertSkipsPassID);
addPass(&BranchRelaxationPassID);		addPass(&BranchRelaxationPassID);
}		}

TargetPassConfig *GCNTargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *GCNTargetMachine::createPassConfig(PassManagerBase &PM) {
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
SIAnnotateControlFlow.cpp		SIAnnotateControlFlow.cpp
SIFixSGPRCopies.cpp		SIFixSGPRCopies.cpp
SIFixupVectorISel.cpp		SIFixupVectorISel.cpp
SIFixVGPRCopies.cpp		SIFixVGPRCopies.cpp
SIPreAllocateWWMRegs.cpp		SIPreAllocateWWMRegs.cpp
SIFoldOperands.cpp		SIFoldOperands.cpp
SIFormMemoryClauses.cpp		SIFormMemoryClauses.cpp
SIFrameLowering.cpp		SIFrameLowering.cpp
		SIInsertHardClauses.cpp
SIInsertSkips.cpp		SIInsertSkips.cpp
SIInsertWaitcnts.cpp		SIInsertWaitcnts.cpp
SIInstrInfo.cpp		SIInstrInfo.cpp
SIISelLowering.cpp		SIISelLowering.cpp
SILoadStoreOptimizer.cpp		SILoadStoreOptimizer.cpp
SILowerControlFlow.cpp		SILowerControlFlow.cpp
SILowerI1Copies.cpp		SILowerI1Copies.cpp
SILowerSGPRSpills.cpp		SILowerSGPRSpills.cpp
Show All 24 Lines

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp

This file was added.

				//===- SIInsertHardClauses.cpp - Insert Hard Clauses ----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// Insert s_clause instructions to form hard clauses.
				///
				/// Clausing load instructions can give cache coherency benefits. Before gfx10,
				/// the hardware automatically detected "soft clauses", which were sequences of
				/// memory instructions of the same type. In gfx10 this detection was removed,
				/// and the s_clause instruction was introduced to explicitly mark "hard
				/// clauses".
				///
				/// It's the scheduler's job to form the clauses by putting similar memory
				/// instructions next to each other. Our job is just to insert an s_clause
				/// instruction to mark the start of each clause.
				///
				/// Note that hard clauses are very similar to, but logically distinct from, the
				/// groups of instructions that have to be restartable when XNACK is enabled.
				/// The rules are slightly different in each case. For example an s_nop
				/// instruction breaks a restartable group, but can appear in the middle of a
				/// hard clause. (Before gfx10 there wasn't a distinction, and both were called
				/// "soft clauses" or just "clauses".)
				///
				/// The SIFormMemoryClauses pass and GCNHazardRecognizer deal with restartable
				/// groups, not hard clauses.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPUSubtarget.h"
				#include "SIInstrInfo.h"
				#include "llvm/ADT/SmallVector.h"

				using namespace llvm;

				#define DEBUG_TYPE "si-insert-hard-clauses"

				namespace {

				enum HardClauseType {
				// Texture, buffer, global or scratch memory instructions.
				HARDCLAUSE_VMEM,
				// Flat (not global or scratch) memory instructions.
				HARDCLAUSE_FLAT,
				// Instructions that access LDS.
				HARDCLAUSE_LDS,
				// Scalar memory instructions.
				HARDCLAUSE_SMEM,
				// VALU instructions.
				HARDCLAUSE_VALU,
				LAST_REAL_HARDCLAUSE_TYPE = HARDCLAUSE_VALU,

				// Internal instructions, which are allowed in the middle of a hard clause,
				// except for s_waitcnt.
				HARDCLAUSE_INTERNAL,
				// Instructions that are not allowed in a hard clause: SALU, export, branch,
				// message, GDS, s_waitcnt and anything else not mentioned above.
				HARDCLAUSE_ILLEGAL,
				};

				HardClauseType getHardClauseType(const MachineInstr &MI) {
				// On current architectures we only get a benefit from clausing loads.
				if (MI.mayLoad()) {
				if (SIInstrInfo::isVMEM(MI) \|\| SIInstrInfo::isSegmentSpecificFLAT(MI))
				return HARDCLAUSE_VMEM;
				if (SIInstrInfo::isFLAT(MI))
				return HARDCLAUSE_FLAT;
				// TODO: LDS
				if (SIInstrInfo::isSMRD(MI))
				return HARDCLAUSE_SMEM;
				}

				// Don't form VALU clauses. It's not clear what benefit they give, if any.

				// In practice s_nop is the only internal instructions we're likely to see.
				// It's safe to treat the rest as illegal.
				if (MI.getOpcode() == AMDGPU::S_NOP)
				return HARDCLAUSE_INTERNAL;
				return HARDCLAUSE_ILLEGAL;
				}

				class SIInsertHardClauses : public MachineFunctionPass {
				public:
				static char ID;

				SIInsertHardClauses() : MachineFunctionPass(ID) {}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				// Track information about a clause as we discover it.
				struct ClauseInfo {
				// The type of all (non-internal) instructions in the clause.
				HardClauseType Type = HARDCLAUSE_ILLEGAL;
				// The first (necessarily non-internal) instruction in the clause.
				MachineInstr *First = nullptr;
				// The last non-internal instruction in the clause.
				MachineInstr *Last = nullptr;
				// The length of the clause including any internal instructions in the
				// middle.
				unsigned Length = 0;
				// The base operands of *Last.
				SmallVector<const MachineOperand *, 4> BaseOps;
				};

				bool emitClause(const ClauseInfo &CI, const SIInstrInfo *SII) {
				assert(CI.Length ==
				std::distance(CI.First->getIterator(), CI.Last->getIterator()) + 1);
				if (CI.Length < 2)
				arsenmUnsubmitted Done Reply Inline Actions Start with lowerccase, and should use SIInstrInfo arsenm: Start with lowerccase, and should use SIInstrInfo
				return false;
				assert(CI.Length <= 64 && "Hard clause is too long!");

				auto &MBB = *CI.First->getParent();
				rampitecUnsubmitted Done Reply Inline Actions Just break the scan at 64 and restart. Also needed test for this. rampitec: Just break the scan at 64 and restart. Also needed test for this.
				foadAuthorUnsubmitted Done Reply Inline Actions OK, but my way was simpler. I tried to add a test (hard-clauses.mir) but this pass is guided by the SIInstrInfo::shouldClusterMemOps heuristic, which never clusters that many loads. I still think this pass should handle it correctly, in case the heuristic ever changes. foad: OK, but my way was simpler. I tried to add a test (hard-clauses.mir) but this pass is guided…
				rampitecUnsubmitted Not Done Reply Inline Actions This way you will restart new clause after the break, so it is better. We may also want to add an option to shouldClusterMemOps() controlling a maximum cluster for this test. This option will be useful anyway for clustering experiments. rampitec: This way you will restart new clause after the break, so it is better. We may also want to add…
				arsenmUnsubmitted Done Reply Inline Actions Do we need to bundle the claused instructions? What prevents inserting something else between these? arsenm: Do we need to bundle the claused instructions? What prevents inserting something else between…
				rampitecUnsubmitted Not Done Reply Inline Actions Looking at the passes which work after this it does not seem anything will be modified here. But you are probably right, just to be on a safer side. rampitec: Looking at the passes which work after this it does not seem anything will be modified here.
				foadAuthorUnsubmitted Done Reply Inline Actions I don't know. I haven't noticed any problems in testing. I guess it just relies on running late enough in the pipeline. foad: I don't know. I haven't noticed any problems in testing. I guess it just relies on running late…
				foadAuthorUnsubmitted Done Reply Inline Actions Would bundling actually fix the problem? What about other code that modifies bundles, like GCNHazardRecognizer::insertNoopInBundle? It feels like there's an arms race where one side invents mechanisms like bundling to stop instructions being inserted, and the other side goes and inserts instructions in the bundles anyway. Having said that, I do have some patches in progress to remove GCNHazardRecognizer::insertNoopInBundle. But I don't know if there are any other places where bundles could be modified. foad: Would bundling actually fix the problem? What about other code that modifies bundles, like…
				arsenmUnsubmitted Done Reply Inline Actions I think the hazard recognizer is the only place with a potentially legitimate reason to try inserting anything else in a bundle arsenm: I think the hazard recognizer is the only place with a potentially legitimate reason to try…
				auto ClauseMI =
				BuildMI(MBB, *CI.First, DebugLoc(), SII->get(AMDGPU::S_CLAUSE))
				.addImm(CI.Length - 1);
				finalizeBundle(MBB, ClauseMI->getIterator(),
				std::next(CI.Last->getIterator()));
				return true;
				rampitecUnsubmitted Done Reply Inline Actions Wrap the check into GCNSubtarget::hasHardClauses()? Also add skipFunction() check. rampitec: Wrap the check into GCNSubtarget::hasHardClauses()? Also add skipFunction() check.
				}

				bool runOnMachineFunction(MachineFunction &MF) override {
				if (skipFunction(MF.getFunction()))
				return false;

				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				arsenmUnsubmitted Done Reply Inline Actions You didn't finalize the bundle, so the register operands on the BUNDLE are missing arsenm: You didn't finalize the bundle, so the register operands on the BUNDLE are missing
				arsenmUnsubmitted Done Reply Inline Actions Could also use MIBundleBuilder arsenm: Could also use MIBundleBuilder
				if (!ST.hasHardClauses())
				return false;

				const SIInstrInfo *SII = ST.getInstrInfo();
				const TargetRegisterInfo *TRI = ST.getRegisterInfo();

				bool Changed = false;
				for (auto &MBB : MF) {
				ClauseInfo CI;
				for (auto &MI : MBB) {
				HardClauseType Type = getHardClauseType(MI);

				int64_t Dummy1;
				bool Dummy2;
				SmallVector<const MachineOperand *, 4> BaseOps;
				if (Type <= LAST_REAL_HARDCLAUSE_TYPE) {
				if (!SII->getMemOperandsWithOffset(MI, BaseOps, Dummy1, Dummy2,
				TRI)) {
				// We failed to get the base operands, so we'll never clause this
				// instruction with any other, so pretend it's illegal.
				Type = HARDCLAUSE_ILLEGAL;
				arsenmUnsubmitted Not Done Reply Inline Actions Probably should use a named constant for the limit arsenm: Probably should use a named constant for the limit
				}
				}

				if (CI.Length == 64 \|\|
				(CI.Length && Type != HARDCLAUSE_INTERNAL &&
				(Type != CI.Type \|\|
				// Note that we lie to shouldClusterMemOps about the size of the
				// cluster. When shouldClusterMemOps is called from the machine
				// scheduler it limits the size of the cluster to avoid increasing
				// register pressure too much, but this pass runs after register
				// allocation so there is no need for that kind of limit.
				!SII->shouldClusterMemOps(CI.BaseOps, BaseOps, 2)))) {
				// Finish the current clause.
				Changed \|= emitClause(CI, SII);
				CI = ClauseInfo();
				}

				if (CI.Length) {
				// Extend the current clause.
				++CI.Length;
				if (Type != HARDCLAUSE_INTERNAL) {
				CI.Last = &MI;
				CI.BaseOps = std::move(BaseOps);
				}
				} else if (Type <= LAST_REAL_HARDCLAUSE_TYPE) {
				// Start a new clause.
				CI = ClauseInfo{Type, &MI, &MI, 1, std::move(BaseOps)};
				}
				}

				// Finish the last clause in the basic block if any.
				if (CI.Length)
				Changed \|= emitClause(CI, SII);
				}

				return Changed;
				}
				};

				} // namespace

				char SIInsertHardClauses::ID = 0;

				char &llvm::SIInsertHardClausesID = SIInsertHardClauses::ID;

				INITIALIZE_PASS(SIInsertHardClauses, DEBUG_TYPE, "SI Insert Hard Clauses",
				false, false)

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2			; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10_W32-LABEL: test_div_fmas_f32:			; GFX10_W32-LABEL: test_div_fmas_f32:
	; GFX10_W32: ; %bb.0:			; GFX10_W32: ; %bb.0:
				; GFX10_W32-NEXT: s_clause 0x4
	; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0xb8			; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0xb8
	; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x70			; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x70
	; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x94			; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x94
	; GFX10_W32-NEXT: s_load_dword s5, s[0:1], 0x4c			; GFX10_W32-NEXT: s_load_dword s5, s[0:1], 0x4c
	; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W32-NEXT: ; implicit-def: $vcc_hi			; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
	; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W32-NEXT: s_and_b32 s2, 1, s2			; GFX10_W32-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s3			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s3
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s4			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s4
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2
	; GFX10_W32-NEXT: v_div_fmas_f32 v2, s5, v0, v1			; GFX10_W32-NEXT: v_div_fmas_f32 v2, s5, v0, v1
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1
	; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off			; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off
	; GFX10_W32-NEXT: s_endpgm			; GFX10_W32-NEXT: s_endpgm
	;			;
	; GFX10_W64-LABEL: test_div_fmas_f32:			; GFX10_W64-LABEL: test_div_fmas_f32:
	; GFX10_W64: ; %bb.0:			; GFX10_W64: ; %bb.0:
				; GFX10_W64-NEXT: s_clause 0x4
	; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0xb8			; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0xb8
	; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x70			; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x70
	; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x94			; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x94
	; GFX10_W64-NEXT: s_load_dword s5, s[0:1], 0x4c			; GFX10_W64-NEXT: s_load_dword s5, s[0:1], 0x4c
	; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W64-NEXT: s_and_b32 s2, 1, s2			; GFX10_W64-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s3			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s3
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_div_fmas_f32 v2, 1.0, v0, v1			; GFX8-NEXT: v_div_fmas_f32 v2, 1.0, v0, v1
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10_W32-LABEL: test_div_fmas_f32_inline_imm_0:			; GFX10_W32-LABEL: test_div_fmas_f32_inline_imm_0:
	; GFX10_W32: ; %bb.0:			; GFX10_W32: ; %bb.0:
				; GFX10_W32-NEXT: s_clause 0x3
	; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0xb8			; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0xb8
	; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x94			; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x94
	; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x70			; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x70
	; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W32-NEXT: ; implicit-def: $vcc_hi			; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
	; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W32-NEXT: s_and_b32 s2, 1, s2			; GFX10_W32-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s3			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s3
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2
	; GFX10_W32-NEXT: v_div_fmas_f32 v2, 1.0, s4, v0			; GFX10_W32-NEXT: v_div_fmas_f32 v2, 1.0, s4, v0
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1
	; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off			; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off
	; GFX10_W32-NEXT: s_endpgm			; GFX10_W32-NEXT: s_endpgm
	;			;
	; GFX10_W64-LABEL: test_div_fmas_f32_inline_imm_0:			; GFX10_W64-LABEL: test_div_fmas_f32_inline_imm_0:
	; GFX10_W64: ; %bb.0:			; GFX10_W64: ; %bb.0:
				; GFX10_W64-NEXT: s_clause 0x3
	; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0xb8			; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0xb8
	; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x94			; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x94
	; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x70			; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x70
	; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W64-NEXT: s_and_b32 s2, 1, s2			; GFX10_W64-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s3			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s3
	; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2			; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_div_fmas_f32 v2, v0, 1.0, v1			; GFX8-NEXT: v_div_fmas_f32 v2, v0, 1.0, v1
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10_W32-LABEL: test_div_fmas_f32_inline_imm_1:			; GFX10_W32-LABEL: test_div_fmas_f32_inline_imm_1:
	; GFX10_W32: ; %bb.0:			; GFX10_W32: ; %bb.0:
				; GFX10_W32-NEXT: s_clause 0x3
	; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0x58			; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0x58
	; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x34			; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x34
	; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x2c			; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x2c
	; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W32-NEXT: ; implicit-def: $vcc_hi			; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
	; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W32-NEXT: s_and_b32 s2, 1, s2			; GFX10_W32-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s3			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s3
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2
	; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, 1.0, v0			; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, 1.0, v0
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1
	; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off			; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off
	; GFX10_W32-NEXT: s_endpgm			; GFX10_W32-NEXT: s_endpgm
	;			;
	; GFX10_W64-LABEL: test_div_fmas_f32_inline_imm_1:			; GFX10_W64-LABEL: test_div_fmas_f32_inline_imm_1:
	; GFX10_W64: ; %bb.0:			; GFX10_W64: ; %bb.0:
				; GFX10_W64-NEXT: s_clause 0x3
	; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0x58			; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0x58
	; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x34			; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x34
	; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x2c			; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x2c
	; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W64-NEXT: s_and_b32 s2, 1, s2			; GFX10_W64-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s3			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s3
	; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2			; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, 1.0			; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, 1.0
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10_W32-LABEL: test_div_fmas_f32_inline_imm_2:			; GFX10_W32-LABEL: test_div_fmas_f32_inline_imm_2:
	; GFX10_W32: ; %bb.0:			; GFX10_W32: ; %bb.0:
				; GFX10_W32-NEXT: s_clause 0x3
	; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0xb8			; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0xb8
	; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x70			; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x70
	; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x4c			; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x4c
	; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W32-NEXT: ; implicit-def: $vcc_hi			; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
	; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W32-NEXT: s_and_b32 s2, 1, s2			; GFX10_W32-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s3			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s3
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2
	; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, v0, 1.0			; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, v0, 1.0
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1
	; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off			; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off
	; GFX10_W32-NEXT: s_endpgm			; GFX10_W32-NEXT: s_endpgm
	;			;
	; GFX10_W64-LABEL: test_div_fmas_f32_inline_imm_2:			; GFX10_W64-LABEL: test_div_fmas_f32_inline_imm_2:
	; GFX10_W64: ; %bb.0:			; GFX10_W64: ; %bb.0:
				; GFX10_W64-NEXT: s_clause 0x3
	; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0xb8			; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0xb8
	; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x70			; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x70
	; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x4c			; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x4c
	; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W64-NEXT: s_and_b32 s2, 1, s2			; GFX10_W64-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s3			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s3
	; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2			; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, s2
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_div_fmas_f64 v[0:1], v[0:1], v[2:3], v[4:5]			; GFX8-NEXT: v_div_fmas_f64 v[0:1], v[0:1], v[2:3], v[4:5]
	; GFX8-NEXT: v_mov_b32_e32 v3, s1			; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_mov_b32_e32 v2, s0			; GFX8-NEXT: v_mov_b32_e32 v2, s0
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10_W32-LABEL: test_div_fmas_f64:			; GFX10_W32-LABEL: test_div_fmas_f64:
	; GFX10_W32: ; %bb.0:			; GFX10_W32: ; %bb.0:
				; GFX10_W32-NEXT: s_clause 0x1
	; GFX10_W32-NEXT: s_load_dword s8, s[0:1], 0x44			; GFX10_W32-NEXT: s_load_dword s8, s[0:1], 0x44
	; GFX10_W32-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; GFX10_W32-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; GFX10_W32-NEXT: ; implicit-def: $vcc_hi			; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
	; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W32-NEXT: s_and_b32 s8, 1, s8			; GFX10_W32-NEXT: s_and_b32 s8, 1, s8
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s4			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s4
	; GFX10_W32-NEXT: v_mov_b32_e32 v2, s6			; GFX10_W32-NEXT: v_mov_b32_e32 v2, s6
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s5			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s5
	; GFX10_W32-NEXT: v_mov_b32_e32 v3, s7			; GFX10_W32-NEXT: v_mov_b32_e32 v3, s7
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s8			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s8
	; GFX10_W32-NEXT: v_div_fmas_f64 v[0:1], s[2:3], v[0:1], v[2:3]			; GFX10_W32-NEXT: v_div_fmas_f64 v[0:1], s[2:3], v[0:1], v[2:3]
	; GFX10_W32-NEXT: v_mov_b32_e32 v3, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v3, s1
	; GFX10_W32-NEXT: v_mov_b32_e32 v2, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v2, s0
	; GFX10_W32-NEXT: global_store_dwordx2 v[2:3], v[0:1], off			; GFX10_W32-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
	; GFX10_W32-NEXT: s_endpgm			; GFX10_W32-NEXT: s_endpgm
	;			;
	; GFX10_W64-LABEL: test_div_fmas_f64:			; GFX10_W64-LABEL: test_div_fmas_f64:
	; GFX10_W64: ; %bb.0:			; GFX10_W64: ; %bb.0:
				; GFX10_W64-NEXT: s_clause 0x1
	; GFX10_W64-NEXT: s_load_dword s8, s[0:1], 0x44			; GFX10_W64-NEXT: s_load_dword s8, s[0:1], 0x44
	; GFX10_W64-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; GFX10_W64-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W64-NEXT: s_and_b32 s8, 1, s8			; GFX10_W64-NEXT: s_and_b32 s8, 1, s8
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s4			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s4
	; GFX10_W64-NEXT: v_mov_b32_e32 v2, s6			; GFX10_W64-NEXT: v_mov_b32_e32 v2, s6
	; GFX10_W64-NEXT: v_mov_b32_e32 v1, s5			; GFX10_W64-NEXT: v_mov_b32_e32 v1, s5
	; GFX10_W64-NEXT: v_mov_b32_e32 v3, s7			; GFX10_W64-NEXT: v_mov_b32_e32 v3, s7
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2			; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10_W32-LABEL: test_div_fmas_f32_cond_to_vcc:			; GFX10_W32-LABEL: test_div_fmas_f32_cond_to_vcc:
	; GFX10_W32: ; %bb.0:			; GFX10_W32: ; %bb.0:
				; GFX10_W32-NEXT: s_clause 0x1
	; GFX10_W32-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x2c			; GFX10_W32-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x2c
	; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W32-NEXT: ; implicit-def: $vcc_hi			; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
	; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W32-NEXT: s_cmp_eq_u32 s7, 0			; GFX10_W32-NEXT: s_cmp_eq_u32 s7, 0
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s5			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s5
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s6			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s6
	; GFX10_W32-NEXT: s_cselect_b32 s2, 1, 0			; GFX10_W32-NEXT: s_cselect_b32 s2, 1, 0
	; GFX10_W32-NEXT: s_and_b32 s2, 1, s2			; GFX10_W32-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s2
	; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, v0, v1			; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, v0, v1
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1
	; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off			; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off
	; GFX10_W32-NEXT: s_endpgm			; GFX10_W32-NEXT: s_endpgm
	;			;
	; GFX10_W64-LABEL: test_div_fmas_f32_cond_to_vcc:			; GFX10_W64-LABEL: test_div_fmas_f32_cond_to_vcc:
	; GFX10_W64: ; %bb.0:			; GFX10_W64: ; %bb.0:
				; GFX10_W64-NEXT: s_clause 0x1
	; GFX10_W64-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x2c			; GFX10_W64-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x2c
	; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W64-NEXT: s_cmp_eq_u32 s7, 0			; GFX10_W64-NEXT: s_cmp_eq_u32 s7, 0
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s5			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s5
	; GFX10_W64-NEXT: v_mov_b32_e32 v1, s6			; GFX10_W64-NEXT: v_mov_b32_e32 v1, s6
	; GFX10_W64-NEXT: s_cselect_b32 s2, 1, 0			; GFX10_W64-NEXT: s_cselect_b32 s2, 1, 0
	; GFX10_W64-NEXT: s_and_b32 s2, 1, s2			; GFX10_W64-NEXT: s_and_b32 s2, 1, s2
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2			; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10_W32-LABEL: test_div_fmas_f32_imm_false_cond_to_vcc:			; GFX10_W32-LABEL: test_div_fmas_f32_imm_false_cond_to_vcc:
	; GFX10_W32: ; %bb.0:			; GFX10_W32: ; %bb.0:
				; GFX10_W32-NEXT: s_clause 0x3
	; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0x70			; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0x70
	; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x94			; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x94
	; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x4c			; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x4c
	; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, 0			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, 0
	; GFX10_W32-NEXT: ; implicit-def: $vcc_hi			; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
	; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s2			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s2
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s3			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s3
	; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, v0, v1			; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, v0, v1
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1
	; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off			; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off
	; GFX10_W32-NEXT: s_endpgm			; GFX10_W32-NEXT: s_endpgm
	;			;
	; GFX10_W64-LABEL: test_div_fmas_f32_imm_false_cond_to_vcc:			; GFX10_W64-LABEL: test_div_fmas_f32_imm_false_cond_to_vcc:
	; GFX10_W64: ; %bb.0:			; GFX10_W64: ; %bb.0:
				; GFX10_W64-NEXT: s_clause 0x3
	; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0x70			; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0x70
	; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x94			; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x94
	; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x4c			; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x4c
	; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, 0			; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, 0
	; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s2			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s2
	; GFX10_W64-NEXT: v_mov_b32_e32 v1, s3			; GFX10_W64-NEXT: v_mov_b32_e32 v1, s3
	Show All 39 Lines
	; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2			; GFX8-NEXT: v_div_fmas_f32 v2, v0, v1, v2
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10_W32-LABEL: test_div_fmas_f32_imm_true_cond_to_vcc:			; GFX10_W32-LABEL: test_div_fmas_f32_imm_true_cond_to_vcc:
	; GFX10_W32: ; %bb.0:			; GFX10_W32: ; %bb.0:
				; GFX10_W32-NEXT: s_clause 0x3
	; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0x70			; GFX10_W32-NEXT: s_load_dword s2, s[0:1], 0x70
	; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x94			; GFX10_W32-NEXT: s_load_dword s3, s[0:1], 0x94
	; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x4c			; GFX10_W32-NEXT: s_load_dword s4, s[0:1], 0x4c
	; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W32-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, 1			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, 1
	; GFX10_W32-NEXT: ; implicit-def: $vcc_hi			; GFX10_W32-NEXT: ; implicit-def: $vcc_hi
	; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W32-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s2			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s2
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s3			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s3
	; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, v0, v1			; GFX10_W32-NEXT: v_div_fmas_f32 v2, s4, v0, v1
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1
	; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off			; GFX10_W32-NEXT: global_store_dword v[0:1], v2, off
	; GFX10_W32-NEXT: s_endpgm			; GFX10_W32-NEXT: s_endpgm
	;			;
	; GFX10_W64-LABEL: test_div_fmas_f32_imm_true_cond_to_vcc:			; GFX10_W64-LABEL: test_div_fmas_f32_imm_true_cond_to_vcc:
	; GFX10_W64: ; %bb.0:			; GFX10_W64: ; %bb.0:
				; GFX10_W64-NEXT: s_clause 0x3
	; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0x70			; GFX10_W64-NEXT: s_load_dword s2, s[0:1], 0x70
	; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x94			; GFX10_W64-NEXT: s_load_dword s3, s[0:1], 0x94
	; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x4c			; GFX10_W64-NEXT: s_load_dword s4, s[0:1], 0x4c
	; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10_W64-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, 1			; GFX10_W64-NEXT: v_cmp_ne_u32_e64 vcc, 0, 1
	; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)			; GFX10_W64-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s2			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s2
	; GFX10_W64-NEXT: v_mov_b32_e32 v1, s3			; GFX10_W64-NEXT: v_mov_b32_e32 v1, s3
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; GFX10_W32-NEXT: v_add_co_u32_e64 v1, vcc_lo, v3, v1			; GFX10_W32-NEXT: v_add_co_u32_e64 v1, vcc_lo, v3, v1
	; GFX10_W32-NEXT: s_cselect_b32 s2, 1, 0			; GFX10_W32-NEXT: s_cselect_b32 s2, 1, 0
	; GFX10_W32-NEXT: v_add_co_ci_u32_e32 v2, vcc_lo, v4, v2, vcc_lo			; GFX10_W32-NEXT: v_add_co_ci_u32_e32 v2, vcc_lo, v4, v2, vcc_lo
	; GFX10_W32-NEXT: v_add_co_u32_e64 v3, vcc_lo, v1, 8			; GFX10_W32-NEXT: v_add_co_u32_e64 v3, vcc_lo, v1, 8
	; GFX10_W32-NEXT: s_and_b32 s2, 1, s2			; GFX10_W32-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W32-NEXT: v_add_co_ci_u32_e32 v4, vcc_lo, 0, v2, vcc_lo			; GFX10_W32-NEXT: v_add_co_ci_u32_e32 v4, vcc_lo, 0, v2, vcc_lo
	; GFX10_W32-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX10_W32-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX10_W32-NEXT: v_cmp_ne_u32_e64 s2, 0, s2			; GFX10_W32-NEXT: v_cmp_ne_u32_e64 s2, 0, s2
				; GFX10_W32-NEXT: s_clause 0x2
	; GFX10_W32-NEXT: global_load_dword v1, v[1:2], off			; GFX10_W32-NEXT: global_load_dword v1, v[1:2], off
	; GFX10_W32-NEXT: global_load_dword v2, v[3:4], off offset:-4			; GFX10_W32-NEXT: global_load_dword v2, v[3:4], off offset:-4
	; GFX10_W32-NEXT: global_load_dword v3, v[3:4], off			; GFX10_W32-NEXT: global_load_dword v3, v[3:4], off
	; GFX10_W32-NEXT: s_and_b32 vcc_lo, vcc_lo, s2			; GFX10_W32-NEXT: s_and_b32 vcc_lo, vcc_lo, s2
	; GFX10_W32-NEXT: s_waitcnt vmcnt(0)			; GFX10_W32-NEXT: s_waitcnt vmcnt(0)
	; GFX10_W32-NEXT: v_div_fmas_f32 v2, v1, v2, v3			; GFX10_W32-NEXT: v_div_fmas_f32 v2, v1, v2, v3
	; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W32-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W32-NEXT: v_mov_b32_e32 v1, s1
	Show All 15 Lines
	; GFX10_W64-NEXT: v_add_co_u32_e64 v1, vcc, v3, v1			; GFX10_W64-NEXT: v_add_co_u32_e64 v1, vcc, v3, v1
	; GFX10_W64-NEXT: s_cselect_b32 s2, 1, 0			; GFX10_W64-NEXT: s_cselect_b32 s2, 1, 0
	; GFX10_W64-NEXT: v_add_co_ci_u32_e32 v2, vcc, v4, v2, vcc			; GFX10_W64-NEXT: v_add_co_ci_u32_e32 v2, vcc, v4, v2, vcc
	; GFX10_W64-NEXT: v_add_co_u32_e64 v3, vcc, v1, 8			; GFX10_W64-NEXT: v_add_co_u32_e64 v3, vcc, v1, 8
	; GFX10_W64-NEXT: s_and_b32 s2, 1, s2			; GFX10_W64-NEXT: s_and_b32 s2, 1, s2
	; GFX10_W64-NEXT: v_add_co_ci_u32_e32 v4, vcc, 0, v2, vcc			; GFX10_W64-NEXT: v_add_co_ci_u32_e32 v4, vcc, 0, v2, vcc
	; GFX10_W64-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX10_W64-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX10_W64-NEXT: v_cmp_ne_u32_e64 s[2:3], 0, s2			; GFX10_W64-NEXT: v_cmp_ne_u32_e64 s[2:3], 0, s2
				; GFX10_W64-NEXT: s_clause 0x2
	; GFX10_W64-NEXT: global_load_dword v1, v[1:2], off			; GFX10_W64-NEXT: global_load_dword v1, v[1:2], off
	; GFX10_W64-NEXT: global_load_dword v2, v[3:4], off offset:-4			; GFX10_W64-NEXT: global_load_dword v2, v[3:4], off offset:-4
	; GFX10_W64-NEXT: global_load_dword v3, v[3:4], off			; GFX10_W64-NEXT: global_load_dword v3, v[3:4], off
	; GFX10_W64-NEXT: s_and_b64 vcc, vcc, s[2:3]			; GFX10_W64-NEXT: s_and_b64 vcc, vcc, s[2:3]
	; GFX10_W64-NEXT: s_waitcnt vmcnt(0)			; GFX10_W64-NEXT: s_waitcnt vmcnt(0)
	; GFX10_W64-NEXT: v_div_fmas_f32 v2, v1, v2, v3			; GFX10_W64-NEXT: v_div_fmas_f32 v2, v1, v2, v3
	; GFX10_W64-NEXT: v_mov_b32_e32 v0, s0			; GFX10_W64-NEXT: v_mov_b32_e32 v0, s0
	; GFX10_W64-NEXT: v_mov_b32_e32 v1, s1			; GFX10_W64-NEXT: v_mov_b32_e32 v1, s1
	▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.scale.ll

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]			; GFX10-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0			; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo
	; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 4			; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 4
	; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: global_load_dword v1, v[2:3], off			; GFX10-NEXT: global_load_dword v1, v[2:3], off
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v2, s2, v1, v1, v0			; GFX10-NEXT: v_div_scale_f32 v2, s2, v1, v1, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: global_store_dword v[0:1], v2, off			; GFX10-NEXT: global_store_dword v[0:1], v2, off
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]			; GFX10-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0			; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo
	; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 4			; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 4
	; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: global_load_dword v1, v[2:3], off			; GFX10-NEXT: global_load_dword v1, v[2:3], off
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v2, s2, v0, v1, v0			; GFX10-NEXT: v_div_scale_f32 v2, s2, v0, v1, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: global_store_dword v[0:1], v2, off			; GFX10-NEXT: global_store_dword v[0:1], v2, off
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_lshlrev_b64 v[0:1], 3, v[0:1]			; GFX10-NEXT: v_lshlrev_b64 v[0:1], 3, v[0:1]
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0			; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo
	; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 8			; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 8
	; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: global_load_dwordx2 v[2:3], v[2:3], off			; GFX10-NEXT: global_load_dwordx2 v[2:3], v[2:3], off
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, v[2:3], v[2:3], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, v[2:3], v[2:3], v[0:1]
	; GFX10-NEXT: v_mov_b32_e32 v3, s1			; GFX10-NEXT: v_mov_b32_e32 v3, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s0			; GFX10-NEXT: v_mov_b32_e32 v2, s0
	; GFX10-NEXT: global_store_dwordx2 v[2:3], v[0:1], off			; GFX10-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_lshlrev_b64 v[0:1], 3, v[0:1]			; GFX10-NEXT: v_lshlrev_b64 v[0:1], 3, v[0:1]
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0			; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo
	; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 8			; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 8
	; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: global_load_dwordx2 v[2:3], v[2:3], off			; GFX10-NEXT: global_load_dwordx2 v[2:3], v[2:3], off
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, v[0:1], v[2:3], v[0:1]			; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, v[0:1], v[2:3], v[0:1]
	; GFX10-NEXT: v_mov_b32_e32 v3, s1			; GFX10-NEXT: v_mov_b32_e32 v3, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s0			; GFX10-NEXT: v_mov_b32_e32 v2, s0
	; GFX10-NEXT: global_store_dwordx2 v[2:3], v[0:1], off			; GFX10-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 582 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v0, v0, s2			; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], v0, v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_all_scalar_1:			; GFX10-LABEL: test_div_scale_f32_all_scalar_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
				; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x4c			; GFX10-NEXT: s_load_dword s2, s[0:1], 0x4c
	; GFX10-NEXT: s_load_dword s3, s[0:1], 0x70			; GFX10-NEXT: s_load_dword s3, s[0:1], 0x70
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v2, s2, s3, s3, s2			; GFX10-NEXT: v_div_scale_f32 v2, s2, s3, s3, s2
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	Show All 29 Lines
	; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], s2, v0, s2			; GFX8-NEXT: v_div_scale_f32 v2, s[2:3], s2, v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v0, s0			; GFX8-NEXT: v_mov_b32_e32 v0, s0
	; GFX8-NEXT: v_mov_b32_e32 v1, s1			; GFX8-NEXT: v_mov_b32_e32 v1, s1
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f32_all_scalar_2:			; GFX10-LABEL: test_div_scale_f32_all_scalar_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
				; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x4c			; GFX10-NEXT: s_load_dword s2, s[0:1], 0x4c
	; GFX10-NEXT: s_load_dword s3, s[0:1], 0x70			; GFX10-NEXT: s_load_dword s3, s[0:1], 0x70
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v2, s2, s2, s3, s2			; GFX10-NEXT: v_div_scale_f32 v2, s2, s2, s3, s2
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	Show All 31 Lines
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[0:1], v[0:1], s[2:3]			; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], v[0:1], v[0:1], s[2:3]
	; GFX8-NEXT: v_mov_b32_e32 v3, s1			; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_mov_b32_e32 v2, s0			; GFX8-NEXT: v_mov_b32_e32 v2, s0
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_all_scalar_1:			; GFX10-LABEL: test_div_scale_f64_all_scalar_1:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
				; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x4c			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x4c
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x74			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x74
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, s[4:5], s[4:5], s[2:3]			; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, s[4:5], s[4:5], s[2:3]
	; GFX10-NEXT: v_mov_b32_e32 v3, s1			; GFX10-NEXT: v_mov_b32_e32 v3, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s0			; GFX10-NEXT: v_mov_b32_e32 v2, s0
	Show All 31 Lines
	; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], s[2:3], v[0:1], s[2:3]			; GFX8-NEXT: v_div_scale_f64 v[0:1], s[2:3], s[2:3], v[0:1], s[2:3]
	; GFX8-NEXT: v_mov_b32_e32 v3, s1			; GFX8-NEXT: v_mov_b32_e32 v3, s1
	; GFX8-NEXT: v_mov_b32_e32 v2, s0			; GFX8-NEXT: v_mov_b32_e32 v2, s0
	; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]			; GFX8-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: test_div_scale_f64_all_scalar_2:			; GFX10-LABEL: test_div_scale_f64_all_scalar_2:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
				; GFX10-NEXT: s_clause 0x2
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x4c			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x4c
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x74			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x74
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, s[2:3], s[4:5], s[2:3]			; GFX10-NEXT: v_div_scale_f64 v[0:1], s2, s[2:3], s[4:5], s[2:3]
	; GFX10-NEXT: v_mov_b32_e32 v3, s1			; GFX10-NEXT: v_mov_b32_e32 v3, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s0			; GFX10-NEXT: v_mov_b32_e32 v2, s0
	▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]			; GFX10-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0			; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo
	; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 4			; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 4
	; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: global_load_dword v1, v[2:3], off			; GFX10-NEXT: global_load_dword v1, v[2:3], off
	; GFX10-NEXT: s_waitcnt vmcnt(1)			; GFX10-NEXT: s_waitcnt vmcnt(1)
	; GFX10-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0			; GFX10-NEXT: v_and_b32_e32 v0, 0x7fffffff, v0
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_div_scale_f32 v2, s2, v1, v1, v0			; GFX10-NEXT: v_div_scale_f32 v2, s2, v1, v1, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]			; GFX10-NEXT: v_lshlrev_b64 v[0:1], 2, v[0:1]
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0			; GFX10-NEXT: v_add_co_u32_e64 v0, vcc_lo, v2, v0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v3, v1, vcc_lo
	; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 4			; GFX10-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 4
	; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: global_load_dword v1, v[2:3], off			; GFX10-NEXT: global_load_dword v1, v[2:3], off
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1			; GFX10-NEXT: v_and_b32_e32 v1, 0x7fffffff, v1
	; GFX10-NEXT: v_div_scale_f32 v2, s2, v1, v1, v0			; GFX10-NEXT: v_div_scale_f32 v2, s2, v1, v1, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: global_store_dword v[0:1], v2, off			; GFX10-NEXT: global_store_dword v[0:1], v2, off
	▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.end.cf.i32.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	define amdgpu_kernel void @test_wave32(i32 %arg0, [8 x i32], i32 %saved) {			define amdgpu_kernel void @test_wave32(i32 %arg0, [8 x i32], i32 %saved) {
	; GCN-LABEL: test_wave32:			; GCN-LABEL: test_wave32:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
				; GCN-NEXT: s_clause 0x1
	; GCN-NEXT: s_load_dword s1, s[4:5], 0x0			; GCN-NEXT: s_load_dword s1, s[4:5], 0x0
	; GCN-NEXT: s_load_dword s0, s[4:5], 0x24			; GCN-NEXT: s_load_dword s0, s[4:5], 0x24
	; GCN-NEXT: ; implicit-def: $vcc_hi			; GCN-NEXT: ; implicit-def: $vcc_hi
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_cmp_lg_u32 s1, 0			; GCN-NEXT: s_cmp_lg_u32 s1, 0
	; GCN-NEXT: s_cselect_b32 s1, 1, 0			; GCN-NEXT: s_cselect_b32 s1, 1, 0
	; GCN-NEXT: s_and_b32 s1, s1, 1			; GCN-NEXT: s_and_b32 s1, s1, 1
	; GCN-NEXT: s_cmp_lg_u32 s1, 0			; GCN-NEXT: s_cmp_lg_u32 s1, 0
	Show All 25 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.if.break.i32.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -global-isel -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	define amdgpu_kernel void @test_wave32(i32 %arg0, [8 x i32], i32 %saved) {			define amdgpu_kernel void @test_wave32(i32 %arg0, [8 x i32], i32 %saved) {
	; GCN-LABEL: test_wave32:			; GCN-LABEL: test_wave32:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
				; GCN-NEXT: s_clause 0x1
	; GCN-NEXT: s_load_dword s0, s[4:5], 0x0			; GCN-NEXT: s_load_dword s0, s[4:5], 0x0
	; GCN-NEXT: s_load_dword s1, s[4:5], 0x24			; GCN-NEXT: s_load_dword s1, s[4:5], 0x24
	; GCN-NEXT: ; implicit-def: $vcc_hi			; GCN-NEXT: ; implicit-def: $vcc_hi
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_cmp_eq_u32 s0, 0			; GCN-NEXT: s_cmp_eq_u32 s0, 0
	; GCN-NEXT: s_cselect_b32 s0, 1, 0			; GCN-NEXT: s_cselect_b32 s0, 1, 0
	; GCN-NEXT: s_and_b32 s0, 1, s0			; GCN-NEXT: s_and_b32 s0, 1, s0
	; GCN-NEXT: v_cmp_ne_u32_e64 s0, 0, s0			; GCN-NEXT: v_cmp_ne_u32_e64 s0, 0, s0
	Show All 12 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mov.dpp.ll

	Show All 14 Lines
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: s_nop 0			; GFX8-NEXT: s_nop 0
	; GFX8-NEXT: v_mov_b32_dpp v2, v2 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0			; GFX8-NEXT: v_mov_b32_dpp v2, v2 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: dpp_test:			; GFX10-LABEL: dpp_test:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
				; GFX10-NEXT: s_clause 0x1 ; encoding: [0x01,0x00,0xa1,0xbf]
	; GFX10-NEXT: s_load_dword s2, s[0:1], 0x2c ; encoding: [0x80,0x00,0x00,0xf4,0x2c,0x00,0x00,0xfa]			; GFX10-NEXT: s_load_dword s2, s[0:1], 0x2c ; encoding: [0x80,0x00,0x00,0xf4,0x2c,0x00,0x00,0xfa]
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24 ; encoding: [0x00,0x00,0x04,0xf4,0x24,0x00,0x00,0xfa]			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24 ; encoding: [0x00,0x00,0x04,0xf4,0x24,0x00,0x00,0xfa]
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt lgkmcnt(0) ; encoding: [0x7f,0xc0,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt lgkmcnt(0) ; encoding: [0x7f,0xc0,0x8c,0xbf]
	; GFX10-NEXT: v_mov_b32_e32 v2, s2 ; encoding: [0x02,0x02,0x04,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v2, s2 ; encoding: [0x02,0x02,0x04,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v0, s0 ; encoding: [0x00,0x02,0x00,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v0, s0 ; encoding: [0x00,0x02,0x00,0x7e]
	; GFX10-NEXT: v_mov_b32_e32 v1, s1 ; encoding: [0x01,0x02,0x02,0x7e]			; GFX10-NEXT: v_mov_b32_e32 v1, s1 ; encoding: [0x01,0x02,0x02,0x7e]
	; GFX10-NEXT: v_mov_b32_dpp v2, v2 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x02,0x04,0x7e,0x02,0x01,0x08,0x11]			; GFX10-NEXT: v_mov_b32_dpp v2, v2 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x02,0x04,0x7e,0x02,0x01,0x08,0x11]
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.update.dpp.ll

	Show All 13 Lines
	; GFX8-NEXT: v_mov_b32_dpp v2, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1			; GFX8-NEXT: v_mov_b32_dpp v2, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1
	; GFX8-NEXT: v_mov_b32_e32 v0, s2			; GFX8-NEXT: v_mov_b32_e32 v0, s2
	; GFX8-NEXT: v_mov_b32_e32 v1, s3			; GFX8-NEXT: v_mov_b32_e32 v1, s3
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: dpp_test:			; GFX10-LABEL: dpp_test:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x2c			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x2c
	; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v0, s3			; GFX10-NEXT: v_mov_b32_e32 v0, s3
	; GFX10-NEXT: v_mov_b32_dpp v2, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1			; GFX10-NEXT: v_mov_b32_dpp v2, v0 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

	Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b32 s6, -1			; GFX9-NEXT: s_mov_b32 s6, -1
	; GFX9-NEXT: v_add_u32_e32 v0, s0, v0			; GFX9-NEXT: v_add_u32_e32 v0, s0, v0
	; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_uniform:			; GFX1064-LABEL: add_i32_uniform:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec			; GFX1064-NEXT: s_mov_b64 s[2:3], exec
				; GFX1064-NEXT: s_clause 0x1
	; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c			; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c
	; GFX1064-NEXT: ; implicit-def: $vgpr1			; GFX1064-NEXT: ; implicit-def: $vgpr1
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0			; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc
	; GFX1064-NEXT: s_cbranch_execz BB1_2			; GFX1064-NEXT: s_cbranch_execz BB1_2
	Show All 18 Lines
	; GFX1064-NEXT: s_mov_b32 s7, 0x31016000			; GFX1064-NEXT: s_mov_b32 s7, 0x31016000
	; GFX1064-NEXT: s_mov_b32 s6, -1			; GFX1064-NEXT: s_mov_b32 s6, -1
	; GFX1064-NEXT: v_add_nc_u32_e32 v0, s0, v0			; GFX1064-NEXT: v_add_nc_u32_e32 v0, s0, v0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: add_i32_uniform:			; GFX1032-LABEL: add_i32_uniform:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
				; GFX1032-NEXT: s_clause 0x1
	; GFX1032-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; GFX1032-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX1032-NEXT: s_load_dword s0, s[0:1], 0x2c			; GFX1032-NEXT: s_load_dword s0, s[0:1], 0x2c
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo			; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi			; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: ; implicit-def: $vgpr1			; GFX1032-NEXT: ; implicit-def: $vgpr1
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0			; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1032-NEXT: s_and_saveexec_b32 s1, vcc_lo			; GFX1032-NEXT: s_and_saveexec_b32 s1, vcc_lo
	▲ Show 20 Lines • Show All 1,501 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b32 s6, -1			; GFX9-NEXT: s_mov_b32 s6, -1
	; GFX9-NEXT: v_sub_u32_e32 v0, s0, v0			; GFX9-NEXT: v_sub_u32_e32 v0, s0, v0
	; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: sub_i32_uniform:			; GFX1064-LABEL: sub_i32_uniform:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_mov_b64 s[2:3], exec			; GFX1064-NEXT: s_mov_b64 s[2:3], exec
				; GFX1064-NEXT: s_clause 0x1
	; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c			; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c
	; GFX1064-NEXT: ; implicit-def: $vgpr1			; GFX1064-NEXT: ; implicit-def: $vgpr1
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0			; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc
	; GFX1064-NEXT: s_cbranch_execz BB9_2			; GFX1064-NEXT: s_cbranch_execz BB9_2
	Show All 18 Lines
	; GFX1064-NEXT: s_mov_b32 s7, 0x31016000			; GFX1064-NEXT: s_mov_b32 s7, 0x31016000
	; GFX1064-NEXT: s_mov_b32 s6, -1			; GFX1064-NEXT: s_mov_b32 s6, -1
	; GFX1064-NEXT: v_sub_nc_u32_e32 v0, s0, v0			; GFX1064-NEXT: v_sub_nc_u32_e32 v0, s0, v0
	; GFX1064-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX1064-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX1064-NEXT: s_endpgm			; GFX1064-NEXT: s_endpgm
	;			;
	; GFX1032-LABEL: sub_i32_uniform:			; GFX1032-LABEL: sub_i32_uniform:
	; GFX1032: ; %bb.0: ; %entry			; GFX1032: ; %bb.0: ; %entry
				; GFX1032-NEXT: s_clause 0x1
	; GFX1032-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; GFX1032-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX1032-NEXT: s_load_dword s0, s[0:1], 0x2c			; GFX1032-NEXT: s_load_dword s0, s[0:1], 0x2c
	; GFX1032-NEXT: s_mov_b32 s2, exec_lo			; GFX1032-NEXT: s_mov_b32 s2, exec_lo
	; GFX1032-NEXT: ; implicit-def: $vcc_hi			; GFX1032-NEXT: ; implicit-def: $vcc_hi
	; GFX1032-NEXT: ; implicit-def: $vgpr1			; GFX1032-NEXT: ; implicit-def: $vgpr1
	; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0			; GFX1032-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1032-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1032-NEXT: s_and_saveexec_b32 s1, vcc_lo			; GFX1032-NEXT: s_and_saveexec_b32 s1, vcc_lo
	▲ Show 20 Lines • Show All 3,158 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/hard-clauses.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass si-insert-hard-clauses %s -o - \| FileCheck %s

				---
				name: long_clause
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $sgpr0_sgpr1_sgpr2_sgpr3, $vgpr0
				; CHECK-LABEL: name: long_clause
				; CHECK: liveins: $sgpr0_sgpr1_sgpr2_sgpr3, $vgpr0
				; CHECK: BUNDLE implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit-def $vgpr3, implicit-def $vgpr3_lo16, implicit-def $vgpr3_hi16, implicit-def $vgpr4, implicit-def $vgpr4_lo16, implicit-def $vgpr4_hi16, implicit-def $vgpr5, implicit-def $vgpr5_lo16, implicit-def $vgpr5_hi16, implicit-def $vgpr6, implicit-def $vgpr6_lo16, implicit-def $vgpr6_hi16, implicit-def $vgpr7, implicit-def $vgpr7_lo16, implicit-def $vgpr7_hi16, implicit-def $vgpr8, implicit-def $vgpr8_lo16, implicit-def $vgpr8_hi16, implicit-def $vgpr9, implicit-def $vgpr9_lo16, implicit-def $vgpr9_hi16, implicit-def $vgpr10, implicit-def $vgpr10_lo16, implicit-def $vgpr10_hi16, implicit-def $vgpr11, implicit-def $vgpr11_lo16, implicit-def $vgpr11_hi16, implicit-def $vgpr12, implicit-def $vgpr12_lo16, implicit-def $vgpr12_hi16, implicit-def $vgpr13, implicit-def $vgpr13_lo16, implicit-def $vgpr13_hi16, implicit-def $vgpr14, implicit-def $vgpr14_lo16, implicit-def $vgpr14_hi16, implicit-def $vgpr15, implicit-def $vgpr15_lo16, implicit-def $vgpr15_hi16, implicit-def $vgpr16, implicit-def $vgpr16_lo16, implicit-def $vgpr16_hi16, implicit-def $vgpr17, implicit-def $vgpr17_lo16, implicit-def $vgpr17_hi16, implicit-def $vgpr18, implicit-def $vgpr18_lo16, implicit-def $vgpr18_hi16, implicit-def $vgpr19, implicit-def $vgpr19_lo16, implicit-def $vgpr19_hi16, implicit-def $vgpr20, implicit-def $vgpr20_lo16, implicit-def $vgpr20_hi16, implicit-def $vgpr21, implicit-def $vgpr21_lo16, implicit-def $vgpr21_hi16, implicit-def $vgpr22, implicit-def $vgpr22_lo16, implicit-def $vgpr22_hi16, implicit-def $vgpr23, implicit-def $vgpr23_lo16, implicit-def $vgpr23_hi16, implicit-def $vgpr24, implicit-def $vgpr24_lo16, implicit-def $vgpr24_hi16, implicit-def $vgpr25, implicit-def $vgpr25_lo16, implicit-def $vgpr25_hi16, implicit-def $vgpr26, implicit-def $vgpr26_lo16, implicit-def $vgpr26_hi16, implicit-def $vgpr27, implicit-def $vgpr27_lo16, implicit-def $vgpr27_hi16, implicit-def $vgpr28, implicit-def $vgpr28_lo16, implicit-def $vgpr28_hi16, implicit-def $vgpr29, implicit-def $vgpr29_lo16, implicit-def $vgpr29_hi16, implicit-def $vgpr30, implicit-def $vgpr30_lo16, implicit-def $vgpr30_hi16, implicit-def $vgpr31, implicit-def $vgpr31_lo16, implicit-def $vgpr31_hi16, implicit-def $vgpr32, implicit-def $vgpr32_lo16, implicit-def $vgpr32_hi16, implicit-def $vgpr33, implicit-def $vgpr33_lo16, implicit-def $vgpr33_hi16, implicit-def $vgpr34, implicit-def $vgpr34_lo16, implicit-def $vgpr34_hi16, implicit-def $vgpr35, implicit-def $vgpr35_lo16, implicit-def $vgpr35_hi16, implicit-def $vgpr36, implicit-def $vgpr36_lo16, implicit-def $vgpr36_hi16, implicit-def $vgpr37, implicit-def $vgpr37_lo16, implicit-def $vgpr37_hi16, implicit-def $vgpr38, implicit-def $vgpr38_lo16, implicit-def $vgpr38_hi16, implicit-def $vgpr39, implicit-def $vgpr39_lo16, implicit-def $vgpr39_hi16, implicit-def $vgpr40, implicit-def $vgpr40_lo16, implicit-def $vgpr40_hi16, implicit-def $vgpr41, implicit-def $vgpr41_lo16, implicit-def $vgpr41_hi16, implicit-def $vgpr42, implicit-def $vgpr42_lo16, implicit-def $vgpr42_hi16, implicit-def $vgpr43, implicit-def $vgpr43_lo16, implicit-def $vgpr43_hi16, implicit-def $vgpr44, implicit-def $vgpr44_lo16, implicit-def $vgpr44_hi16, implicit-def $vgpr45, implicit-def $vgpr45_lo16, implicit-def $vgpr45_hi16, implicit-def $vgpr46, implicit-def $vgpr46_lo16, implicit-def $vgpr46_hi16, implicit-def $vgpr47, implicit-def $vgpr47_lo16, implicit-def $vgpr47_hi16, implicit-def $vgpr48, implicit-def $vgpr48_lo16, implicit-def $vgpr48_hi16, implicit-def $vgpr49, implicit-def $vgpr49_lo16, implicit-def $vgpr49_hi16, implicit-def $vgpr50, implicit-def $vgpr50_lo16, implicit-def $vgpr50_hi16, implicit-def $vgpr51, implicit-def $vgpr51_lo16, implicit-def $vgpr51_hi16, implicit-def $vgpr52, implicit-def $vgpr52_lo16, implicit-def $vgpr52_hi16, implicit-def $vgpr53, implicit-def $vgpr53_lo16, implicit-def $vgpr53_hi16, implicit-def $vgpr54, implicit-def $vgpr54_lo16, implicit-def $vgpr54_hi16, implicit-def $vgpr55, implicit-def $vgpr55_lo16, implicit-def $vgpr55_hi16, implicit-def $vgpr56, implicit-def $vgpr56_lo16, implicit-def $vgpr56_hi16, implicit-def $vgpr57, implicit-def $vgpr57_lo16, implicit-def $vgpr57_hi16, implicit-def $vgpr58, implicit-def $vgpr58_lo16, implicit-def $vgpr58_hi16, implicit-def $vgpr59, implicit-def $vgpr59_lo16, implicit-def $vgpr59_hi16, implicit-def $vgpr60, implicit-def $vgpr60_lo16, implicit-def $vgpr60_hi16, implicit-def $vgpr61, implicit-def $vgpr61_lo16, implicit-def $vgpr61_hi16, implicit-def $vgpr62, implicit-def $vgpr62_lo16, implicit-def $vgpr62_hi16, implicit-def $vgpr63, implicit-def $vgpr63_lo16, implicit-def $vgpr63_hi16, implicit-def $vgpr64, implicit-def $vgpr64_lo16, implicit-def $vgpr64_hi16, implicit $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $exec {
				; CHECK: S_CLAUSE 63
				; CHECK: $vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr2 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 8, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr3 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 12, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr4 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 16, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr5 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 20, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr6 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 24, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr7 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 28, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr8 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 32, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr9 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 36, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr10 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 40, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr11 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 44, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr12 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 48, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr13 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 52, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr14 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 56, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr15 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 60, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr16 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 64, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr17 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 68, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr18 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 72, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr19 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 76, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr20 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 80, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr21 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 84, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr22 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 88, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr23 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 92, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr24 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 96, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr25 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 100, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr26 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 104, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr27 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 108, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr28 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 112, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr29 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 116, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr30 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 120, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr31 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 124, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr32 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 128, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr33 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 132, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr34 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 136, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr35 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 140, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr36 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 144, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr37 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 148, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr38 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 152, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr39 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 156, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr40 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 160, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr41 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 164, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr42 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 168, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr43 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 172, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr44 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 176, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr45 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 180, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr46 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 184, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr47 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 188, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr48 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 192, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr49 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 196, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr50 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 200, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr51 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 204, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr52 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 208, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr53 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 212, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr54 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 216, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr55 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 220, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr56 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 224, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr57 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 228, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr58 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 232, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr59 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 236, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr60 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 240, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr61 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 244, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr62 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 248, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr63 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 252, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr64 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 256, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: }
				; CHECK: BUNDLE implicit-def $vgpr65, implicit-def $vgpr65_lo16, implicit-def $vgpr65_hi16, implicit-def $vgpr66, implicit-def $vgpr66_lo16, implicit-def $vgpr66_hi16, implicit-def $vgpr67, implicit-def $vgpr67_lo16, implicit-def $vgpr67_hi16, implicit-def $vgpr68, implicit-def $vgpr68_lo16, implicit-def $vgpr68_hi16, implicit-def $vgpr69, implicit-def $vgpr69_lo16, implicit-def $vgpr69_hi16, implicit-def $vgpr70, implicit-def $vgpr70_lo16, implicit-def $vgpr70_hi16, implicit-def $vgpr71, implicit-def $vgpr71_lo16, implicit-def $vgpr71_hi16, implicit-def $vgpr72, implicit-def $vgpr72_lo16, implicit-def $vgpr72_hi16, implicit-def $vgpr73, implicit-def $vgpr73_lo16, implicit-def $vgpr73_hi16, implicit-def $vgpr74, implicit-def $vgpr74_lo16, implicit-def $vgpr74_hi16, implicit-def $vgpr75, implicit-def $vgpr75_lo16, implicit-def $vgpr75_hi16, implicit-def $vgpr76, implicit-def $vgpr76_lo16, implicit-def $vgpr76_hi16, implicit-def $vgpr77, implicit-def $vgpr77_lo16, implicit-def $vgpr77_hi16, implicit-def $vgpr78, implicit-def $vgpr78_lo16, implicit-def $vgpr78_hi16, implicit-def $vgpr79, implicit-def $vgpr79_lo16, implicit-def $vgpr79_hi16, implicit-def $vgpr80, implicit-def $vgpr80_lo16, implicit-def $vgpr80_hi16, implicit $vgpr0, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $exec {
				; CHECK: S_CLAUSE 15
				; CHECK: $vgpr65 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 260, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr66 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 264, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr67 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 268, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr68 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 272, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr69 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 276, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr70 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 280, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr71 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 284, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr72 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 288, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr73 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 292, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr74 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 296, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr75 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 300, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr76 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 304, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr77 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 308, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr78 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 312, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr79 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 316, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: $vgpr80 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 320, 0, 0, 0, 0, 0, implicit $exec
				; CHECK: }
				$vgpr1 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4, 0, 0, 0, 0, 0, implicit $exec
				$vgpr2 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 8, 0, 0, 0, 0, 0, implicit $exec
				$vgpr3 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 12, 0, 0, 0, 0, 0, implicit $exec
				$vgpr4 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 16, 0, 0, 0, 0, 0, implicit $exec
				$vgpr5 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 20, 0, 0, 0, 0, 0, implicit $exec
				$vgpr6 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 24, 0, 0, 0, 0, 0, implicit $exec
				$vgpr7 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 28, 0, 0, 0, 0, 0, implicit $exec
				$vgpr8 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 32, 0, 0, 0, 0, 0, implicit $exec
				$vgpr9 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 36, 0, 0, 0, 0, 0, implicit $exec
				$vgpr10 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 40, 0, 0, 0, 0, 0, implicit $exec
				$vgpr11 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 44, 0, 0, 0, 0, 0, implicit $exec
				$vgpr12 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 48, 0, 0, 0, 0, 0, implicit $exec
				$vgpr13 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 52, 0, 0, 0, 0, 0, implicit $exec
				$vgpr14 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 56, 0, 0, 0, 0, 0, implicit $exec
				$vgpr15 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 60, 0, 0, 0, 0, 0, implicit $exec
				$vgpr16 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 64, 0, 0, 0, 0, 0, implicit $exec
				$vgpr17 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 68, 0, 0, 0, 0, 0, implicit $exec
				$vgpr18 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 72, 0, 0, 0, 0, 0, implicit $exec
				$vgpr19 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 76, 0, 0, 0, 0, 0, implicit $exec
				$vgpr20 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 80, 0, 0, 0, 0, 0, implicit $exec
				$vgpr21 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 84, 0, 0, 0, 0, 0, implicit $exec
				$vgpr22 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 88, 0, 0, 0, 0, 0, implicit $exec
				$vgpr23 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 92, 0, 0, 0, 0, 0, implicit $exec
				$vgpr24 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 96, 0, 0, 0, 0, 0, implicit $exec
				$vgpr25 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 100, 0, 0, 0, 0, 0, implicit $exec
				$vgpr26 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 104, 0, 0, 0, 0, 0, implicit $exec
				$vgpr27 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 108, 0, 0, 0, 0, 0, implicit $exec
				$vgpr28 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 112, 0, 0, 0, 0, 0, implicit $exec
				$vgpr29 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 116, 0, 0, 0, 0, 0, implicit $exec
				$vgpr30 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 120, 0, 0, 0, 0, 0, implicit $exec
				$vgpr31 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 124, 0, 0, 0, 0, 0, implicit $exec
				$vgpr32 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 128, 0, 0, 0, 0, 0, implicit $exec
				$vgpr33 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 132, 0, 0, 0, 0, 0, implicit $exec
				$vgpr34 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 136, 0, 0, 0, 0, 0, implicit $exec
				$vgpr35 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 140, 0, 0, 0, 0, 0, implicit $exec
				$vgpr36 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 144, 0, 0, 0, 0, 0, implicit $exec
				$vgpr37 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 148, 0, 0, 0, 0, 0, implicit $exec
				$vgpr38 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 152, 0, 0, 0, 0, 0, implicit $exec
				$vgpr39 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 156, 0, 0, 0, 0, 0, implicit $exec
				$vgpr40 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 160, 0, 0, 0, 0, 0, implicit $exec
				$vgpr41 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 164, 0, 0, 0, 0, 0, implicit $exec
				$vgpr42 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 168, 0, 0, 0, 0, 0, implicit $exec
				$vgpr43 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 172, 0, 0, 0, 0, 0, implicit $exec
				$vgpr44 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 176, 0, 0, 0, 0, 0, implicit $exec
				$vgpr45 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 180, 0, 0, 0, 0, 0, implicit $exec
				$vgpr46 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 184, 0, 0, 0, 0, 0, implicit $exec
				$vgpr47 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 188, 0, 0, 0, 0, 0, implicit $exec
				$vgpr48 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 192, 0, 0, 0, 0, 0, implicit $exec
				$vgpr49 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 196, 0, 0, 0, 0, 0, implicit $exec
				$vgpr50 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 200, 0, 0, 0, 0, 0, implicit $exec
				$vgpr51 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 204, 0, 0, 0, 0, 0, implicit $exec
				$vgpr52 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 208, 0, 0, 0, 0, 0, implicit $exec
				$vgpr53 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 212, 0, 0, 0, 0, 0, implicit $exec
				$vgpr54 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 216, 0, 0, 0, 0, 0, implicit $exec
				$vgpr55 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 220, 0, 0, 0, 0, 0, implicit $exec
				$vgpr56 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 224, 0, 0, 0, 0, 0, implicit $exec
				$vgpr57 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 228, 0, 0, 0, 0, 0, implicit $exec
				$vgpr58 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 232, 0, 0, 0, 0, 0, implicit $exec
				$vgpr59 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 236, 0, 0, 0, 0, 0, implicit $exec
				$vgpr60 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 240, 0, 0, 0, 0, 0, implicit $exec
				$vgpr61 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 244, 0, 0, 0, 0, 0, implicit $exec
				$vgpr62 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 248, 0, 0, 0, 0, 0, implicit $exec
				$vgpr63 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 252, 0, 0, 0, 0, 0, implicit $exec
				$vgpr64 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 256, 0, 0, 0, 0, 0, implicit $exec
				$vgpr65 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 260, 0, 0, 0, 0, 0, implicit $exec
				$vgpr66 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 264, 0, 0, 0, 0, 0, implicit $exec
				$vgpr67 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 268, 0, 0, 0, 0, 0, implicit $exec
				$vgpr68 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 272, 0, 0, 0, 0, 0, implicit $exec
				$vgpr69 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 276, 0, 0, 0, 0, 0, implicit $exec
				$vgpr70 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 280, 0, 0, 0, 0, 0, implicit $exec
				$vgpr71 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 284, 0, 0, 0, 0, 0, implicit $exec
				$vgpr72 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 288, 0, 0, 0, 0, 0, implicit $exec
				$vgpr73 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 292, 0, 0, 0, 0, 0, implicit $exec
				$vgpr74 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 296, 0, 0, 0, 0, 0, implicit $exec
				$vgpr75 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 300, 0, 0, 0, 0, 0, implicit $exec
				$vgpr76 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 304, 0, 0, 0, 0, 0, implicit $exec
				$vgpr77 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 308, 0, 0, 0, 0, 0, implicit $exec
				$vgpr78 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 312, 0, 0, 0, 0, 0, implicit $exec
				$vgpr79 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 316, 0, 0, 0, 0, 0, implicit $exec
				$vgpr80 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 320, 0, 0, 0, 0, 0, implicit $exec
				...

llvm/test/CodeGen/AMDGPU/idot2.ll

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot2_u32_u16 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot2_u32_u16 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot2:			; GFX10-DL-LABEL: udot2:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_add_u32_e32 v2, s5, v0			; GFX9-DL-NEXT: v_add_u32_e32 v2, s5, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot2_MulMul:			; GFX10-DL-LABEL: udot2_MulMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_mov_b32 s4, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s4, 0xffff
	; GFX10-DL-NEXT: s_load_dword s5, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s5, s[0:1], 0x0
	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot2_i32_i16 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot2_i32_i16 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot2:			; GFX10-DL-LABEL: idot2:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v2, v0			; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v2, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot2_MixedTypedMul:			; GFX10-DL-LABEL: idot2_MixedTypedMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot2_u32_u16 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot2_u32_u16 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot2_alt_AddOperands:			; GFX10-DL-LABEL: udot2_alt_AddOperands:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v2, v0			; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v2, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot2_MixedExt:			; GFX10-DL-LABEL: idot2_MixedExt:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, s4, v0			; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, s4, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: notudot2_SameVec:			; GFX10-DL-LABEL: notudot2_SameVec:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot2_u32_u16 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot2_u32_u16 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot2_v4i16:			; GFX10-DL-LABEL: udot2_v4i16:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot2_u32_u16 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot2_u32_u16 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot2_v4i16_Hi:			; GFX10-DL-LABEL: udot2_v4i16_Hi:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x4			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x4
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x4			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x4
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v1, v0			; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: notudot2_v4i16_Even:			; GFX10-DL-LABEL: notudot2_v4i16_Even:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s6, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[0:1], 0x0
	; GFX10-DL-NEXT: s_mov_b32 s7, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s7, 0xffff
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v1, v0			; GFX9-DL-NEXT: v_mad_u32_u24 v2, s4, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: notudot2_v4i16_Middle:			; GFX10-DL-LABEL: notudot2_v4i16_Middle:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dwordx2 s[2:3], s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s6, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[0:1], 0x0
	; GFX10-DL-NEXT: s_mov_b32 s7, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s7, 0xffff
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v1, v0			; GFX9-DL-NEXT: v_mad_u32_u24 v2, s7, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: notudot2_DiffIndex:			; GFX10-DL-LABEL: notudot2_DiffIndex:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_mov_b32 s5, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s5, 0xffff
	▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0			; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot2_MultipleUses_add1:			; GFX10-DL-LABEL: udot2_MultipleUses_add1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_mov_b32 s5, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s5, 0xffff
	▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0			; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot2_MultipleUses_add1:			; GFX10-DL-LABEL: idot2_MultipleUses_add1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v1, v0			; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot2_MultipleUses_mul1:			; GFX10-DL-LABEL: udot2_MultipleUses_mul1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_mov_b32 s5, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s5, 0xffff
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v1, v0			; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot2_MultipleUses_mul1:			; GFX10-DL-LABEL: idot2_MultipleUses_mul1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v1, v0			; GFX9-DL-NEXT: v_mad_u32_u24 v2, s2, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot2_MultipleUses_mul2:			; GFX10-DL-LABEL: udot2_MultipleUses_mul2:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v2, v0			; GFX9-DL-NEXT: v_mad_i32_i24 v2, s6, v2, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot2_MultipleUses_mul2:			; GFX10-DL-LABEL: idot2_MultipleUses_mul2:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_i32_i24 v2, v3, v1, v0			; GFX9-DL-NEXT: v_mad_i32_i24 v2, v3, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: notsdot2_sext8:			; GFX10-DL-LABEL: notsdot2_sext8:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s4
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s5
	; GFX10-DL-NEXT: v_mov_b32_e32 v2, s6			; GFX10-DL-NEXT: v_mov_b32_e32 v2, s6
	; GFX10-DL-NEXT: v_mov_b32_e32 v3, s7			; GFX10-DL-NEXT: v_mov_b32_e32 v3, s7
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot4s.ll

	Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot4_i32_i8 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot4_i32_i8 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot4_acc32:			; GFX10-DL-LABEL: idot4_acc32:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 506 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_i32_i24 v2, s2, v1, v0			; GFX9-DL-NEXT: v_mad_i32_i24 v2, s2, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot4_multiuse_mul1:			; GFX10-DL-LABEL: idot4_multiuse_mul1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_i32_i24 v2, s4, v1, v0			; GFX9-DL-NEXT: v_mad_i32_i24 v2, s4, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot4_acc32_vecMul:			; GFX10-DL-LABEL: idot4_acc32_vecMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot4u.ll

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot4_u32_u8 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot4_u32_u8 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot4_acc32:			; GFX10-DL-LABEL: udot4_acc32:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 1,000 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v1, v0			; GFX9-DL-NEXT: v_mad_u32_u24 v2, s3, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot4_multiuse_mul1:			; GFX10-DL-LABEL: udot4_multiuse_mul1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_movk_i32 s5, 0xff			; GFX10-DL-NEXT: s_movk_i32 s5, 0xff
	▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_add_u32_e32 v2, v0, v1			; GFX9-DL-NEXT: v_add_u32_e32 v2, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot4_multiuse_add1:			; GFX10-DL-LABEL: udot4_multiuse_add1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_movk_i32 s5, 0xff			; GFX10-DL-NEXT: s_movk_i32 s5, 0xff
	▲ Show 20 Lines • Show All 382 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v1, v0			; GFX9-DL-NEXT: v_mad_u32_u24 v2, s5, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot4_acc32_vecMul:			; GFX10-DL-LABEL: udot4_acc32_vecMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_movk_i32 s5, 0xff			; GFX10-DL-NEXT: s_movk_i32 s5, 0xff
	▲ Show 20 Lines • Show All 462 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot8s.ll

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot8_i32_i4 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot8_i32_i4 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot8_acc32:			; GFX10-DL-LABEL: idot8_acc32:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 947 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0			; GFX9-DL-NEXT: v_add_u32_e32 v2, v1, v0
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot8_multiuses_mul1:			; GFX10-DL-LABEL: idot8_multiuses_mul1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot8_i32_i4 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot8_i32_i4 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: idot8_acc32_vecMul:			; GFX10-DL-LABEL: idot8_acc32_vecMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 838 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot8u.ll

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot8_acc32:			; GFX10-DL-LABEL: udot8_acc32:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 1,471 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_add_u32_e32 v2, v0, v1			; GFX9-DL-NEXT: v_add_u32_e32 v2, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot8_multiuses_mul1:			; GFX10-DL-LABEL: udot8_multiuses_mul1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s3, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot8_acc32_vecMul:			; GFX10-DL-LABEL: udot8_acc32_vecMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 1,145 Lines • ▼ Show 20 Lines
	; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1			; GFX9-DL-NEXT: v_dot8_u32_u4 v2, s4, v0, v1
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off			; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-DL-NEXT: s_endpgm			; GFX9-DL-NEXT: s_endpgm
	;			;
	; GFX10-DL-LABEL: udot8_variant1:			; GFX10-DL-LABEL: udot8_variant1:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
				; GFX10-DL-NEXT: s_clause 0x1
	; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
	; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s6, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[0:1], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[2:3], 0x0
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.ll

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	entry:
%val = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 0, i32 0, i32 0)		%val = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 0, i32 0, i32 0)
%tmp2 = getelementptr float, float addrspace(3)* %lds, i32 4		%tmp2 = getelementptr float, float addrspace(3)* %lds, i32 4
store float 0.0, float addrspace(3)* %tmp2		store float 0.0, float addrspace(3)* %tmp2
ret float %val		ret float %val
}		}

;CHECK-LABEL: {{^}}buffer_load_x1_offen_merged_and:		;CHECK-LABEL: {{^}}buffer_load_x1_offen_merged_and:
;CHECK-NEXT: %bb.		;CHECK-NEXT: %bb.
		;GFX10-NEXT: s_clause
;CHECK-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:4		;CHECK-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:4
;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:28		;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:28
;CHECK: s_waitcnt		;CHECK: s_waitcnt
define amdgpu_ps void @buffer_load_x1_offen_merged_and(<4 x i32> inreg %rsrc, i32 %a) {		define amdgpu_ps void @buffer_load_x1_offen_merged_and(<4 x i32> inreg %rsrc, i32 %a) {
main_body:		main_body:
%a1 = add i32 %a, 4		%a1 = add i32 %a, 4
%a2 = add i32 %a, 8		%a2 = add i32 %a, 8
%a3 = add i32 %a, 12		%a3 = add i32 %a, 12
Show All 9 Lines	main_body:
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float %r4, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float %r4, i1 true, i1 true)
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r5, float %r6, float undef, float undef, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r5, float %r6, float undef, float undef, i1 true, i1 true)
ret void		ret void
}		}

;CHECK-LABEL: {{^}}buffer_load_x1_offen_merged_or:		;CHECK-LABEL: {{^}}buffer_load_x1_offen_merged_or:
;CHECK-NEXT: %bb.		;CHECK-NEXT: %bb.
;CHECK-NEXT: v_lshlrev_b32_e32 v{{[0-9]}}, 6, v0		;CHECK-NEXT: v_lshlrev_b32_e32 v{{[0-9]}}, 6, v0
		;GFX10-NEXT: s_clause
;CHECK-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], v{{[0-9]}}, s[0:3], 0 offen offset:4		;CHECK-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], v{{[0-9]}}, s[0:3], 0 offen offset:4
;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v{{[0-9]}}, s[0:3], 0 offen offset:28		;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v{{[0-9]}}, s[0:3], 0 offen offset:28
;CHECK: s_waitcnt		;CHECK: s_waitcnt
define amdgpu_ps void @buffer_load_x1_offen_merged_or(<4 x i32> inreg %rsrc, i32 %inp) {		define amdgpu_ps void @buffer_load_x1_offen_merged_or(<4 x i32> inreg %rsrc, i32 %inp) {
main_body:		main_body:
%a = shl i32 %inp, 6		%a = shl i32 %inp, 6
%a1 = or i32 %a, 4		%a1 = or i32 %a, 4
%a2 = or i32 %a, 8		%a2 = or i32 %a, 8
Show All 9 Lines	main_body:
%r6 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 %a6, i32 0, i32 0)		%r6 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 %a6, i32 0, i32 0)
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float %r4, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float %r4, i1 true, i1 true)
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r5, float %r6, float undef, float undef, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r5, float %r6, float undef, float undef, i1 true, i1 true)
ret void		ret void
}		}

;CHECK-LABEL: {{^}}buffer_load_x1_offen_merged_glc_slc:		;CHECK-LABEL: {{^}}buffer_load_x1_offen_merged_glc_slc:
;CHECK-NEXT: %bb.		;CHECK-NEXT: %bb.
		;GFX10-NEXT: s_clause
;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:4{{$}}		;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:4{{$}}
;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:12 glc{{$}}		;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:12 glc{{$}}
;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:28 glc slc{{$}}		;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:28 glc slc{{$}}
;CHECK: s_waitcnt		;CHECK: s_waitcnt
define amdgpu_ps void @buffer_load_x1_offen_merged_glc_slc(<4 x i32> inreg %rsrc, i32 %a) {		define amdgpu_ps void @buffer_load_x1_offen_merged_glc_slc(<4 x i32> inreg %rsrc, i32 %a) {
main_body:		main_body:
%a1 = add i32 %a, 4		%a1 = add i32 %a, 4
%a2 = add i32 %a, 8		%a2 = add i32 %a, 8
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	main_body:
%r3 = extractelement <2 x float> %vr2, i32 0		%r3 = extractelement <2 x float> %vr2, i32 0
%r4 = extractelement <2 x float> %vr2, i32 1		%r4 = extractelement <2 x float> %vr2, i32 1
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float %r4, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float %r4, i1 true, i1 true)
ret void		ret void
}		}

;CHECK-LABEL: {{^}}buffer_load_x1_offset_merged:		;CHECK-LABEL: {{^}}buffer_load_x1_offset_merged:
;CHECK-NEXT: %bb.		;CHECK-NEXT: %bb.
		;GFX10-NEXT: s_clause
;CHECK-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:4		;CHECK-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:4
;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:28		;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:28
;CHECK: s_waitcnt		;CHECK: s_waitcnt
define amdgpu_ps void @buffer_load_x1_offset_merged(<4 x i32> inreg %rsrc) {		define amdgpu_ps void @buffer_load_x1_offset_merged(<4 x i32> inreg %rsrc) {
main_body:		main_body:
%r1 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 4, i32 0, i32 0)		%r1 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 4, i32 0, i32 0)
%r2 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 8, i32 0, i32 0)		%r2 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 8, i32 0, i32 0)
%r3 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 12, i32 0, i32 0)		%r3 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 12, i32 0, i32 0)
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
main_body:		main_body:
%val = call <4 x i16> @llvm.amdgcn.raw.buffer.load.v4i16(<4 x i32> %rsrc, i32 0, i32 0, i32 0)		%val = call <4 x i16> @llvm.amdgcn.raw.buffer.load.v4i16(<4 x i32> %rsrc, i32 0, i32 0, i32 0)
store <4 x i16> %val, <4 x i16> addrspace(3)* %ptr		store <4 x i16> %val, <4 x i16> addrspace(3)* %ptr
ret void		ret void
}		}

;CHECK-LABEL: {{^}}raw_buffer_load_x1_offset_merged:		;CHECK-LABEL: {{^}}raw_buffer_load_x1_offset_merged:
;CHECK-NEXT: %bb.		;CHECK-NEXT: %bb.
		;GFX10-NEXT: s_clause
;CHECK-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:4		;CHECK-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:4
;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:28		;CHECK-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], off, s[0:3], 0 offset:28
;CHECK: s_waitcnt		;CHECK: s_waitcnt
define amdgpu_ps void @raw_buffer_load_x1_offset_merged(<4 x i32> inreg %rsrc) {		define amdgpu_ps void @raw_buffer_load_x1_offset_merged(<4 x i32> inreg %rsrc) {
main_body:		main_body:
%r1 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 4, i32 0, i32 0)		%r1 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 4, i32 0, i32 0)
%r2 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 8, i32 0, i32 0)		%r2 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 8, i32 0, i32 0)
%r3 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 12, i32 0, i32 0)		%r3 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 12, i32 0, i32 0)
%r4 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 16, i32 0, i32 0)		%r4 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 16, i32 0, i32 0)
%r5 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 28, i32 0, i32 0)		%r5 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 28, i32 0, i32 0)
%r6 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 32, i32 0, i32 0)		%r6 = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> %rsrc, i32 32, i32 0, i32 0)
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float %r4, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r1, float %r2, float %r3, float %r4, i1 true, i1 true)
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r5, float %r6, float undef, float undef, i1 true, i1 true)		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r5, float %r6, float undef, float undef, i1 true, i1 true)
ret void		ret void
}		}

;CHECK-LABEL: {{^}}raw_buffer_load_x1_offset_swizzled_not_merged:		;CHECK-LABEL: {{^}}raw_buffer_load_x1_offset_swizzled_not_merged:
;CHECK-NEXT: %bb.		;CHECK-NEXT: %bb.
		;GFX10-NEXT: s_clause
;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:4		;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:4
;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:8		;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:8
;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:12		;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:12
;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:16		;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:16
;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:28		;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:28
;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:32		;CHECK-NEXT: buffer_load_dword v{{[0-9]}}, off, s[0:3], 0 offset:32
;CHECK: s_waitcnt		;CHECK: s_waitcnt
define amdgpu_ps void @raw_buffer_load_x1_offset_swizzled_not_merged(<4 x i32> inreg %rsrc) {		define amdgpu_ps void @raw_buffer_load_x1_offset_swizzled_not_merged(<4 x i32> inreg %rsrc) {
Show All 28 Lines

llvm/test/CodeGen/AMDGPU/shrink-add-sub-constant.ll

	Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; GFX10-LABEL: v_test_i32_x_sub_64_multi_use:			; GFX10-LABEL: v_test_i32_x_sub_64_multi_use:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_add_co_u32_e64 v0, s2, s2, v2			; GFX10-NEXT: v_add_co_u32_e64 v0, s2, s2, v2
	; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s2, s3, 0, s2			; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s2, s3, 0, s2
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dword v3, v[0:1], off			; GFX10-NEXT: global_load_dword v3, v[0:1], off
	; GFX10-NEXT: global_load_dword v4, v[0:1], off			; GFX10-NEXT: global_load_dword v4, v[0:1], off
	; GFX10-NEXT: v_add_co_u32_e64 v0, s0, s0, v2			; GFX10-NEXT: v_add_co_u32_e64 v0, s0, s0, v2
	; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s0, s1, 0, s0			; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s0, s1, 0, s0
	; GFX10-NEXT: s_waitcnt vmcnt(1)			; GFX10-NEXT: s_waitcnt vmcnt(1)
	; GFX10-NEXT: v_subrev_nc_u32_e32 v2, 64, v3			; GFX10-NEXT: v_subrev_nc_u32_e32 v2, 64, v3
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_subrev_nc_u32_e32 v3, 64, v4			; GFX10-NEXT: v_subrev_nc_u32_e32 v3, 64, v4
	▲ Show 20 Lines • Show All 816 Lines • ▼ Show 20 Lines
	; GFX10-LABEL: v_test_i16_x_sub_64_multi_use:			; GFX10-LABEL: v_test_i16_x_sub_64_multi_use:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[0:3], s[0:1], 0x24
	; GFX10-NEXT: v_lshlrev_b32_e32 v2, 1, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v2, 1, v0
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_add_co_u32_e64 v0, s2, s2, v2			; GFX10-NEXT: v_add_co_u32_e64 v0, s2, s2, v2
	; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s2, s3, 0, s2			; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s2, s3, 0, s2
				; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_ushort v3, v[0:1], off			; GFX10-NEXT: global_load_ushort v3, v[0:1], off
	; GFX10-NEXT: global_load_ushort v4, v[0:1], off			; GFX10-NEXT: global_load_ushort v4, v[0:1], off
	; GFX10-NEXT: v_add_co_u32_e64 v0, s0, s0, v2			; GFX10-NEXT: v_add_co_u32_e64 v0, s0, s0, v2
	; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s0, s1, 0, s0			; GFX10-NEXT: v_add_co_ci_u32_e64 v1, s0, s1, 0, s0
	; GFX10-NEXT: s_waitcnt vmcnt(1)			; GFX10-NEXT: s_waitcnt vmcnt(1)
	; GFX10-NEXT: v_sub_nc_u16_e64 v2, v3, 64			; GFX10-NEXT: v_sub_nc_u16_e64 v2, v3, 64
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_sub_nc_u16_e64 v3, v4, 64			; GFX10-NEXT: v_sub_nc_u16_e64 v3, v4, 64
	▲ Show 20 Lines • Show All 1,561 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/smrd.ll

Show First 20 Lines • Show All 371 Lines • ▼ Show 20 Lines	main_body:
%r = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 %off, i32 0)		%r = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 %off, i32 0)
ret float %r		ret float %r
}		}

; GCN-LABEL: {{^}}smrd_imm_merged:		; GCN-LABEL: {{^}}smrd_imm_merged:
; GCN-NEXT: %bb.		; GCN-NEXT: %bb.
; SICI-NEXT: s_buffer_load_dwordx4 s[{{[0-9]}}:{{[0-9]}}], s[0:3], 0x1		; SICI-NEXT: s_buffer_load_dwordx4 s[{{[0-9]}}:{{[0-9]}}], s[0:3], 0x1
; SICI-NEXT: s_buffer_load_dwordx2 s[{{[0-9]}}:{{[0-9]}}], s[0:3], 0x7		; SICI-NEXT: s_buffer_load_dwordx2 s[{{[0-9]}}:{{[0-9]}}], s[0:3], 0x7
		; GFX10-NEXT: s_clause
; VIGFX9_10-NEXT: s_buffer_load_dwordx4 s[{{[0-9]}}:{{[0-9]}}], s[0:3], 0x4		; VIGFX9_10-NEXT: s_buffer_load_dwordx4 s[{{[0-9]}}:{{[0-9]}}], s[0:3], 0x4
; VIGFX9_10-NEXT: s_buffer_load_dwordx2 s[{{[0-9]}}:{{[0-9]}}], s[0:3], 0x1c		; VIGFX9_10-NEXT: s_buffer_load_dwordx2 s[{{[0-9]}}:{{[0-9]}}], s[0:3], 0x1c
define amdgpu_ps void @smrd_imm_merged(<4 x i32> inreg %desc) #0 {		define amdgpu_ps void @smrd_imm_merged(<4 x i32> inreg %desc) #0 {
main_body:		main_body:
%r1 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 4, i32 0)		%r1 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 4, i32 0)
%r2 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 8, i32 0)		%r2 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 8, i32 0)
%r3 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 12, i32 0)		%r3 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 12, i32 0)
%r4 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 16, i32 0)		%r4 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> %desc, i32 16, i32 0)
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	main_body:

%res.tmp = fadd float %a, %b		%res.tmp = fadd float %a, %b
%res = fadd float %res.tmp, %c		%res = fadd float %res.tmp, %c
ret float %res		ret float %res
}		}

; GCN-LABEL: {{^}}smrd_vgpr_merged:		; GCN-LABEL: {{^}}smrd_vgpr_merged:
; GCN-NEXT: %bb.		; GCN-NEXT: %bb.
		; GFX10-NEXT: s_clause
; GCN-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:4		; GCN-NEXT: buffer_load_dwordx4 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:4
; GCN-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:28		; GCN-NEXT: buffer_load_dwordx2 v[{{[0-9]}}:{{[0-9]}}], v0, s[0:3], 0 offen offset:28
define amdgpu_ps void @smrd_vgpr_merged(<4 x i32> inreg %desc, i32 %a) #0 {		define amdgpu_ps void @smrd_vgpr_merged(<4 x i32> inreg %desc, i32 %a) #0 {
main_body:		main_body:
%a1 = add i32 %a, 4		%a1 = add i32 %a, 4
%a2 = add i32 %a, 8		%a2 = add i32 %a, 8
%a3 = add i32 %a, 12		%a3 = add i32 %a, 12
%a4 = add i32 %a, 16		%a4 = add i32 %a, 16
▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 \| FileCheck %s --check-prefix=GCN			; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 \| FileCheck %s --check-prefix=GCN

	define void @vgpr_descriptor_waterfall_loop_idom_update(<4 x i32>* %arg) {			define void @vgpr_descriptor_waterfall_loop_idom_update(<4 x i32>* %arg) {
	; GCN-LABEL: vgpr_descriptor_waterfall_loop_idom_update:			; GCN-LABEL: vgpr_descriptor_waterfall_loop_idom_update:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_waitcnt_vscnt null, 0x0			; GCN-NEXT: s_waitcnt_vscnt null, 0x0
	; GCN-NEXT: ; implicit-def: $vcc_hi			; GCN-NEXT: ; implicit-def: $vcc_hi
	; GCN-NEXT: BB0_1: ; %bb0			; GCN-NEXT: BB0_1: ; %bb0
	; GCN-NEXT: ; =>This Loop Header: Depth=1			; GCN-NEXT: ; =>This Loop Header: Depth=1
	; GCN-NEXT: ; Child Loop BB0_2 Depth 2			; GCN-NEXT: ; Child Loop BB0_2 Depth 2
	; GCN-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 8			; GCN-NEXT: v_add_co_u32_e64 v2, vcc_lo, v0, 8
	; GCN-NEXT: s_mov_b32 s5, exec_lo			; GCN-NEXT: s_mov_b32 s5, exec_lo
	; GCN-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo			; GCN-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v1, vcc_lo
				; GCN-NEXT: s_clause 0x1
	; GCN-NEXT: flat_load_dwordx2 v[2:3], v[2:3]			; GCN-NEXT: flat_load_dwordx2 v[2:3], v[2:3]
	; GCN-NEXT: flat_load_dwordx2 v[4:5], v[0:1]			; GCN-NEXT: flat_load_dwordx2 v[4:5], v[0:1]
	; GCN-NEXT: BB0_2: ; Parent Loop BB0_1 Depth=1			; GCN-NEXT: BB0_2: ; Parent Loop BB0_1 Depth=1
	; GCN-NEXT: ; => This Inner Loop Header: Depth=2			; GCN-NEXT: ; => This Inner Loop Header: Depth=2
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_readfirstlane_b32 s8, v4			; GCN-NEXT: v_readfirstlane_b32 s8, v4
	; GCN-NEXT: v_readfirstlane_b32 s9, v5			; GCN-NEXT: v_readfirstlane_b32 s9, v5
	; GCN-NEXT: v_readfirstlane_b32 s10, v2			; GCN-NEXT: v_readfirstlane_b32 s10, v2
	Show All 25 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_nop			; GFX10-NEXT: v_nop
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, extern_func@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+4			; GFX10-NEXT: s_addc_u32 s5, s5, extern_func@gotpcrel32@hi+4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]

	; GFX10: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10: buffer_load_dword v43, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12

	; GFX10: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX10: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX10: s_setpc_b64 s[4:5]			; GFX10: s_setpc_b64 s[4:5]
	main_body:			main_body:
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0
	call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0			call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0
	call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0			call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0
	call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0			call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_mov_b32_e32 v44, v12			; GFX10-NEXT: v_mov_b32_e32 v44, v12
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off			; GFX10-NEXT: global_store_dwordx4 v[0:1], v[0:3], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v44, v43, v42, v41, v40], s[36:43], s[44:47] dmask:0x1			; GFX10-NEXT: image_gather4_c_b_cl v[0:3], [v44, v43, v42, v41, v40], s[36:43], s[44:47] dmask:0x1

	; GFX10: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10: buffer_load_dword v44, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16
	; GFX10: buffer_load_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX10: buffer_load_dword v45, off, s[0:3], s32 offset:20
	; GFX10: s_setpc_b64 s[4:5]			; GFX10: s_setpc_b64 s[4:5]
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	store <4 x float> %v, <4 x float> addrspace(1)* undef			store <4 x float> %v, <4 x float> addrspace(1)* undef
	call void @extern_func()			call void @extern_func()
	%v1 = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)			%v1 = call <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 1, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 false, i32 0, i32 0)
	ret <4 x float> %v1			ret <4 x float> %v1
	}			}

	declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 immarg, float, float, float, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.b.cl.2d.v4f32.f32.f32(i32 immarg, float, float, float, float, float, <8 x i32>, <4 x i32>, i1 immarg, i32 immarg, i32 immarg) #1

	attributes #0 = { nounwind writeonly }			attributes #0 = { nounwind writeonly }
	attributes #1 = { nounwind readonly }			attributes #1 = { nounwind readonly }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] New SIInsertHardClauses passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 264042

llvm/lib/Target/AMDGPU/AMDGPU.h

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.scale.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.end.cf.i32.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.if.break.i32.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.mov.dpp.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.update.dpp.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

llvm/test/CodeGen/AMDGPU/hard-clauses.mir

llvm/test/CodeGen/AMDGPU/idot2.ll

llvm/test/CodeGen/AMDGPU/idot4s.ll

llvm/test/CodeGen/AMDGPU/idot4u.ll

llvm/test/CodeGen/AMDGPU/idot8s.ll

llvm/test/CodeGen/AMDGPU/idot8u.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.raw.buffer.load.ll

llvm/test/CodeGen/AMDGPU/shrink-add-sub-constant.ll

llvm/test/CodeGen/AMDGPU/smrd.ll

llvm/test/CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

[AMDGPU] New SIInsertHardClauses pass
ClosedPublic