This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Bundle loads before post-RA scheduler
ClosedPublic

Authored by rampitec on Jan 14 2020, 2:49 PM.

Download Raw Diff

Details

Reviewers

foad
kerbowa
vpykhtin

Commits

rG555d8f4ef5eb: [AMDGPU] Bundle loads before post-RA scheduler

Summary

We are relying on atrificial DAG edges inserted by the
MemOpClusterMutation to keep loads and stores together in the
post-RA scheduler. This does not work all the time since it
allows to schedule a completely independent instruction in the
middle of the cluster.

Removed the DAG mutation and added pass to bundle already
clustered instructions. These bundles are unpacked before the
memory legalizer because it does not work with bundles but also
because it allows to insert waitcounts in the middle of a store
cluster.

Removing artificial edges also allows a more relaxed scheduling.

Diff Detail

Event Timeline

rampitec created this revision.Jan 14 2020, 2:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 14 2020, 2:49 PM

Herald added subscribers: Petar.Avramovic, hiraditya, t-tye and 9 others. · View Herald Transcript

Is this just working around the post-RA scheduler? Can we finally replace the post-RA scheduler with misched?

In D72737#1820814, @arsenm wrote:

Is this just working around the post-RA scheduler? Can we finally replace the post-RA scheduler with misched?

Post RA scheduler in fact works pretty well. That is not really clear if we should replace it, then in any way we likely want a different set of heuristics than in pre-RA scheduler. The pre-RA shall deal with RP, ILP and clustering, where post-RA needs to deal with more subtle aspects of SQ. We just cannot address all of these (often opposite) heuristics in the same algorithm. But for sure it shall not undo clustering done pre-RA.

Like you say the machine scheduler can choose not to prioritize cluster edges, so this is much more restrictive. We should be really sure we always want this clustering. There are probably cases where the cost/benefit is not in favor of bundling, but it's not easy to predict.

By the way I think we actually do try to insert waitcnt into bundles. I wonder if it would be worth it to just update the memory legalizer to work with bundles as well.

I have found a bug in the logic. Will update the review.

Actually I misunderstood the change because of the title. I see now that the loads are bundled before post-RA scheduler.

In D72737#1820905, @kerbowa wrote:

Actually I misunderstood the change because of the title. I see now that the loads are bundled before post-RA scheduler.

Ough, thanks! Fixed the title.

In D72737#1820865, @kerbowa wrote:

By the way I think we actually do try to insert waitcnt into bundles. I wonder if it would be worth it to just update the memory legalizer to work with bundles as well.

I saw waitcount inserted after the bundle if I do not unpack them, so probably we do not do it always if we really do.

Fixed bug in the bundling logic and added test for produced bundles and when do they break.

In D72737#1820946, @rampitec wrote:

In D72737#1820865, @kerbowa wrote:

By the way I think we actually do try to insert waitcnt into bundles. I wonder if it would be worth it to just update the memory legalizer to work with bundles as well.

I saw waitcount inserted after the bundle if I do not unpack them, so probably we do not do it always if we really do.

Yeah maybe. I remember it was added here: c04aab9c0646461bc187808920b3d5ee7f5cc5ab
Was the waitcnt after the bundle that you saw in the correct place?

Rebased.

In D72737#1820974, @kerbowa wrote:

In D72737#1820946, @rampitec wrote:

In D72737#1820865, @kerbowa wrote:

By the way I think we actually do try to insert waitcnt into bundles. I wonder if it would be worth it to just update the memory legalizer to work with bundles as well.

I saw waitcount inserted after the bundle if I do not unpack them, so probably we do not do it always if we really do.

Yeah maybe. I remember it was added here: c04aab9c0646461bc187808920b3d5ee7f5cc5ab
Was the waitcnt after the bundle that you saw in the correct place?

Yes, it was correct. It was just vmcnt(0) because we could not tear apart load bundle and need all the loaded values in a store bundle. When I unpack it each store was waiting for vmcnt(3) which is better.

There are nice changes in a bunch of tests, where we're preserving clusters instead of breaking them apart.

But there are also strange changes in some other tests, where the clustering hasn't changed, but some instructions that use the result of a load have moved around. Does this mean we're getting the latency of the load wrong now? (Or were we getting it wrong before?) For example:
insert_vector_elt
llvm.maxnum.f16.ll
saddo.ll
sign_extend.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll
280	Nice.
llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll
188 ↗	(On Diff #238144)	What happened here? Has some cost estimate changed because of the bundling? Can we fix it?
llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
278	Nice.
llvm/test/CodeGen/AMDGPU/idot2.ll
2677	Nice.
2797	Is this just coincidence, or are we actually trying to cluster a FLAT load with an SMEM load?
llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
582–583	Nice.
llvm/test/CodeGen/AMDGPU/lshr.v2i16.ll
148–149	Nice.
llvm/test/CodeGen/AMDGPU/memory_clause.ll
8	Nice.
llvm/test/CodeGen/AMDGPU/shl.ll
166	Nice.
llvm/test/CodeGen/AMDGPU/shl.v2i16.ll
149–150	Nice.

In D72737#1821378, @foad wrote:

There are nice changes in a bunch of tests, where we're preserving clusters instead of breaking them apart.

But there are also strange changes in some other tests, where the clustering hasn't changed, but some instructions that use the result of a load have moved around. Does this mean we're getting the latency of the load wrong now? (Or were we getting it wrong before?) For example:
insert_vector_elt
llvm.maxnum.f16.ll
saddo.ll
sign_extend.ll

We have moved uses of loaded values further from their loads, which is good. As far as I understand these changes are inducted by the removal of artificial edges which were created by MemOpClusterMutation. These edges were linking successors of any load to all the nodes in a cluster and restricted the scheduling.
In sign_extend.ll that is because of the store clustering, we have moved v_ashrrev_i32_e32 producing v2 past v_ashrrev_i32_e32 producing v3 because store cluster uses them in this order. Before it was harder to do because of the artificial edges linking all predecessors to all stores.

rampitec marked 2 inline comments as done.Jan 15 2020, 8:45 AM

rampitec added inline comments.

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll
188 ↗	(On Diff #238144)	It was duplicated by Branch Probability Basic Block Placement immediately after the post-RA scheduler. It is now duplicated because of -tail-dup-placement-threshold default value of 2. If you use 3 it will be duplicated w/o bundling. That is because TailDuplicator::shouldTailDuplicate() simply count instructions and compare against the threshold: https://llvm.org/doxygen/TailDuplicator_8cpp_source.html#l00622 It can be fixed in a separate follow-up patch to add a bundle's size if it is a bundle, I am not sure if it may affect other targets or not.
llvm/test/CodeGen/AMDGPU/idot2.ll
2797	No, we do not: BUNDLE implicit-def $vgpr2, implicit-def $vgpr0, implicit killed $vgpr2_vgpr3, implicit $exec, implicit killed $vgpr0_vgpr1 { renamable $vgpr2 = GLOBAL_LOAD_USHORT killed renamable $vgpr2_vgpr3, 0, 0, 0, 0, implicit $exec :: (load 2 from %ir.2, addrspace 1) renamable $vgpr0 = GLOBAL_LOAD_USHORT killed renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec :: (load 2 from %ir.3, addrspace 1) } renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr4_sgpr5, 0, 0, 0 :: (load 4 from %ir.4, addrspace 1) That is because of removed mutation again I guess. Anyway, this is a better schedule because global loads take longer than SMRD.

In D72737#1821937, @rampitec wrote:

We have moved uses of loaded values further from their loads, which is good. As far as I understand these changes are inducted by the removal of artificial edges which were created by MemOpClusterMutation. These edges were linking successors of any load to all the nodes in a cluster and restricted the scheduling.
In sign_extend.ll that is because of the store clustering, we have moved v_ashrrev_i32_e32 producing v2 past v_ashrrev_i32_e32 producing v3 because store cluster uses them in this order. Before it was harder to do because of the artificial edges linking all predecessors to all stores.

OK, that sounds plausible, thanks!

rampitec marked an inline comment as done.Jan 15 2020, 9:40 AM

rampitec added inline comments.

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll
188 ↗	(On Diff #238144)	Apparently that does not affect any other target: D72783

Rebased on top of D72783 to prevent tail duplication of a bundle.

rampitec added a parent revision: D72783: Process BUNDLE in tail duplication.Jan 15 2020, 10:09 AM

Prevent bundling of loads with overlapping destinations. They are dependent.

LGTM

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
1300	just out of curiosity: why is this needed?

This revision is now accepted and ready to land.Jan 21 2020, 9:12 AM

rampitec marked an inline comment as done.Jan 21 2020, 10:32 AM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
1300	We need to set it when we forming bundle to model the fact a single bundle instruction reads a register which is defined inside the same instruction. Therefore, when we unpack the bundle this flag needs to be reversed.

foad added inline comments.Jan 24 2020, 3:54 AM

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
1300	You mean when the address of one load depends on the result of another load? Surely we shouldn't be bundling loads that are dependent like that?

rampitec marked an inline comment as done.Jan 24 2020, 8:00 AM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
1300	Yes, we do not. But I really cannot rely on that when unpacking.

Closed by commit rG555d8f4ef5eb: [AMDGPU] Bundle loads before post-RA scheduler (authored by rampitec). · Explain WhyJan 24 2020, 11:38 AM

This revision was automatically updated to reflect the committed changes.

foad mentioned this in D98940: [AMDGPU] Allow index optimisation in SIPreEmitPeephole for bundles.Mar 19 2021, 7:11 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.h

4 lines

AMDGPUSubtarget.cpp

48 lines

AMDGPUTargetMachine.cpp

2 lines

CMakeLists.txt

1 line

SIInstrInfo.cpp

16 lines

SIMemoryLegalizer.cpp

15 lines

SIPostRABundler.cpp

138 lines

test/

CodeGen/

AMDGPU/

atomic_optimizations_local_pointer.ll

4 lines

byval-frame-setup.ll

4 lines

call-argument-types.ll

13 lines

callee-special-input-vgprs.ll

2 lines

24 lines

17 lines

4 lines

2 lines

2 lines

24 lines

20 lines

48 lines

24 lines

86 lines

30 lines

insert_vector_elt.v2i16.ll

64 lines

14 lines

14 lines

2 lines

10 lines

local-memory.amdgcn.ll

2 lines

lshr.v2i16.ll

8 lines

memory-legalizer-load.ll

6 lines

memory_clause.ll

18 lines

merge-store-crash.ll

6 lines

postra-bundle-memops.mir

108 lines

promote-constOffset-to-imm.ll

6 lines

4 lines

5 lines

12 lines

135 lines

4 lines

8 lines

si-triv-disjoint-mem-access.ll

10 lines

2 lines

8 lines

2 lines

8 lines

10 lines

Diff 238308

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	FunctionPass *createSIWholeQuadModePass();			FunctionPass *createSIWholeQuadModePass();
	FunctionPass *createSIFixControlFlowLiveIntervalsPass();			FunctionPass *createSIFixControlFlowLiveIntervalsPass();
	FunctionPass *createSIOptimizeExecMaskingPreRAPass();			FunctionPass *createSIOptimizeExecMaskingPreRAPass();
	FunctionPass *createSIFixSGPRCopiesPass();			FunctionPass *createSIFixSGPRCopiesPass();
	FunctionPass *createSIMemoryLegalizerPass();			FunctionPass *createSIMemoryLegalizerPass();
	FunctionPass *createSIInsertWaitcntsPass();			FunctionPass *createSIInsertWaitcntsPass();
	FunctionPass *createSIPreAllocateWWMRegsPass();			FunctionPass *createSIPreAllocateWWMRegsPass();
	FunctionPass *createSIFormMemoryClausesPass();			FunctionPass *createSIFormMemoryClausesPass();
				FunctionPass *createSIPostRABundlerPass();
	FunctionPass *createAMDGPUSimplifyLibCallsPass(const TargetOptions &,			FunctionPass *createAMDGPUSimplifyLibCallsPass(const TargetOptions &,
	const TargetMachine *);			const TargetMachine *);
	FunctionPass *createAMDGPUUseNativeCallsPass();			FunctionPass *createAMDGPUUseNativeCallsPass();
	FunctionPass *createAMDGPUCodeGenPreparePass();			FunctionPass *createAMDGPUCodeGenPreparePass();
	FunctionPass *createAMDGPUMachineCFGStructurizerPass();			FunctionPass *createAMDGPUMachineCFGStructurizerPass();
	FunctionPass createAMDGPUPropagateAttributesEarlyPass(const TargetMachine );			FunctionPass createAMDGPUPropagateAttributesEarlyPass(const TargetMachine );
	ModulePass createAMDGPUPropagateAttributesLatePass(const TargetMachine );			ModulePass createAMDGPUPropagateAttributesLatePass(const TargetMachine );
	FunctionPass *createAMDGPURewriteOutArgumentsPass();			FunctionPass *createAMDGPURewriteOutArgumentsPass();
	▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	extern char &SIModeRegisterID;			extern char &SIModeRegisterID;

	void initializeSIInsertWaitcntsPass(PassRegistry&);			void initializeSIInsertWaitcntsPass(PassRegistry&);
	extern char &SIInsertWaitcntsID;			extern char &SIInsertWaitcntsID;

	void initializeSIFormMemoryClausesPass(PassRegistry&);			void initializeSIFormMemoryClausesPass(PassRegistry&);
	extern char &SIFormMemoryClausesID;			extern char &SIFormMemoryClausesID;

				void initializeSIPostRABundlerPass(PassRegistry&);
				extern char &SIPostRABundlerID;

	void initializeAMDGPUUnifyDivergentExitNodesPass(PassRegistry&);			void initializeAMDGPUUnifyDivergentExitNodesPass(PassRegistry&);
	extern char &AMDGPUUnifyDivergentExitNodesID;			extern char &AMDGPUUnifyDivergentExitNodesID;

	ImmutablePass *createAMDGPUAAWrapperPass();			ImmutablePass *createAMDGPUAAWrapperPass();
	void initializeAMDGPUAAWrapperPassPass(PassRegistry&);			void initializeAMDGPUAAWrapperPassPass(PassRegistry&);
	ImmutablePass *createAMDGPUExternalAAWrapperPass();			ImmutablePass *createAMDGPUExternalAAWrapperPass();
	void initializeAMDGPUExternalAAWrapperPass(PassRegistry&);			void initializeAMDGPUExternalAAWrapperPass(PassRegistry&);

	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 748 Lines • ▼ Show 20 Lines	for (++I; I != E && I->isBundledWithPred() && Lat; ++I) {
break;		break;
--Lat;		--Lat;
}		}
Dep.setLatency(Lat);		Dep.setLatency(Lat);
}		}
}		}

namespace {		namespace {
struct MemOpClusterMutation : ScheduleDAGMutation {
const SIInstrInfo *TII;

MemOpClusterMutation(const SIInstrInfo *tii) : TII(tii) {}

void apply(ScheduleDAGInstrs *DAG) override {
SUnit *SUa = nullptr;
// Search for two consequent memory operations and link them
// to prevent scheduler from moving them apart.
// In DAG pre-process SUnits are in the original order of
// the instructions before scheduling.
for (SUnit &SU : DAG->SUnits) {
MachineInstr &MI2 = *SU.getInstr();
if (!MI2.mayLoad() && !MI2.mayStore()) {
SUa = nullptr;
continue;
}
if (!SUa) {
SUa = &SU;
continue;
}

MachineInstr &MI1 = *SUa->getInstr();
if ((TII->isVMEM(MI1) && TII->isVMEM(MI2)) \|\|
(TII->isFLAT(MI1) && TII->isFLAT(MI2)) \|\|
(TII->isSMRD(MI1) && TII->isSMRD(MI2)) \|\|
(TII->isDS(MI1) && TII->isDS(MI2))) {
SU.addPredBarrier(SUa);

for (const SDep &SI : SU.Preds) {
if (SI.getSUnit() != SUa)
SUa->addPred(SDep(SI.getSUnit(), SDep::Artificial));
}

if (&SU != &DAG->ExitSU) {
for (const SDep &SI : SUa->Succs) {
if (SI.getSUnit() != &SU)
SI.getSUnit()->addPred(SDep(&SU, SDep::Artificial));
}
}
}

SUa = &SU;
}
}
};

struct FillMFMAShadowMutation : ScheduleDAGMutation {		struct FillMFMAShadowMutation : ScheduleDAGMutation {
const SIInstrInfo *TII;		const SIInstrInfo *TII;

ScheduleDAGMI *DAG;		ScheduleDAGMI *DAG;

FillMFMAShadowMutation(const SIInstrInfo *tii) : TII(tii) {}		FillMFMAShadowMutation(const SIInstrInfo *tii) : TII(tii) {}

bool isSALU(const SUnit *SU) const {		bool isSALU(const SUnit *SU) const {
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	for (SUnit &SU : DAG->SUnits) {
}		}
}		}
}		}
};		};
} // namespace		} // namespace

void GCNSubtarget::getPostRAMutations(		void GCNSubtarget::getPostRAMutations(
std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {		std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {
Mutations.push_back(std::make_unique<MemOpClusterMutation>(&InstrInfo));
Mutations.push_back(std::make_unique<FillMFMAShadowMutation>(&InstrInfo));		Mutations.push_back(std::make_unique<FillMFMAShadowMutation>(&InstrInfo));
}		}

const AMDGPUSubtarget &AMDGPUSubtarget::get(const MachineFunction &MF) {		const AMDGPUSubtarget &AMDGPUSubtarget::get(const MachineFunction &MF) {
if (MF.getTarget().getTargetTriple().getArch() == Triple::amdgcn)		if (MF.getTarget().getTargetTriple().getArch() == Triple::amdgcn)
return static_cast<const AMDGPUSubtarget&>(MF.getSubtarget<GCNSubtarget>());		return static_cast<const AMDGPUSubtarget&>(MF.getSubtarget<GCNSubtarget>());
else		else
return static_cast<const AMDGPUSubtarget&>(MF.getSubtarget<R600Subtarget>());		return static_cast<const AMDGPUSubtarget&>(MF.getSubtarget<R600Subtarget>());
}		}

const AMDGPUSubtarget &AMDGPUSubtarget::get(const TargetMachine &TM, const Function &F) {		const AMDGPUSubtarget &AMDGPUSubtarget::get(const TargetMachine &TM, const Function &F) {
if (TM.getTargetTriple().getArch() == Triple::amdgcn)		if (TM.getTargetTriple().getArch() == Triple::amdgcn)
return static_cast<const AMDGPUSubtarget&>(TM.getSubtarget<GCNSubtarget>(F));		return static_cast<const AMDGPUSubtarget&>(TM.getSubtarget<GCNSubtarget>(F));
else		else
return static_cast<const AMDGPUSubtarget&>(TM.getSubtarget<R600Subtarget>(F));		return static_cast<const AMDGPUSubtarget&>(TM.getSubtarget<R600Subtarget>(F));
}		}

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeSIWholeQuadModePass(*PR);		initializeSIWholeQuadModePass(*PR);
initializeSILowerControlFlowPass(*PR);		initializeSILowerControlFlowPass(*PR);
initializeSIRemoveShortExecBranchesPass(*PR);		initializeSIRemoveShortExecBranchesPass(*PR);
initializeSIInsertSkipsPass(*PR);		initializeSIInsertSkipsPass(*PR);
initializeSIMemoryLegalizerPass(*PR);		initializeSIMemoryLegalizerPass(*PR);
initializeSIOptimizeExecMaskingPass(*PR);		initializeSIOptimizeExecMaskingPass(*PR);
initializeSIPreAllocateWWMRegsPass(*PR);		initializeSIPreAllocateWWMRegsPass(*PR);
initializeSIFormMemoryClausesPass(*PR);		initializeSIFormMemoryClausesPass(*PR);
		initializeSIPostRABundlerPass(*PR);
initializeAMDGPUUnifyDivergentExitNodesPass(*PR);		initializeAMDGPUUnifyDivergentExitNodesPass(*PR);
initializeAMDGPUAAWrapperPassPass(*PR);		initializeAMDGPUAAWrapperPassPass(*PR);
initializeAMDGPUExternalAAWrapperPass(*PR);		initializeAMDGPUExternalAAWrapperPass(*PR);
initializeAMDGPUUseNativeCallsPass(*PR);		initializeAMDGPUUseNativeCallsPass(*PR);
initializeAMDGPUSimplifyLibCallsPass(*PR);		initializeAMDGPUSimplifyLibCallsPass(*PR);
initializeAMDGPUInlinerPass(*PR);		initializeAMDGPUInlinerPass(*PR);
initializeAMDGPUPrintfRuntimeBindingPass(*PR);		initializeAMDGPUPrintfRuntimeBindingPass(*PR);
initializeGCNRegBankReassignPass(*PR);		initializeGCNRegBankReassignPass(*PR);
▲ Show 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	if (getOptLevel() > CodeGenOpt::None)
addPass(&SIOptimizeExecMaskingID);		addPass(&SIOptimizeExecMaskingID);
TargetPassConfig::addPostRegAlloc();		TargetPassConfig::addPostRegAlloc();

// Equivalent of PEI for SGPRs.		// Equivalent of PEI for SGPRs.
addPass(&SILowerSGPRSpillsID);		addPass(&SILowerSGPRSpillsID);
}		}

void GCNPassConfig::addPreSched2() {		void GCNPassConfig::addPreSched2() {
		addPass(&SIPostRABundlerID);
}		}

void GCNPassConfig::addPreEmitPass() {		void GCNPassConfig::addPreEmitPass() {
addPass(createSIMemoryLegalizerPass());		addPass(createSIMemoryLegalizerPass());
addPass(createSIInsertWaitcntsPass());		addPass(createSIInsertWaitcntsPass());
addPass(createSIShrinkInstructionsPass());		addPass(createSIShrinkInstructionsPass());
addPass(createSIModeRegisterPass());		addPass(createSIModeRegisterPass());

▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
SILowerI1Copies.cpp		SILowerI1Copies.cpp
SILowerSGPRSpills.cpp		SILowerSGPRSpills.cpp
SIMachineFunctionInfo.cpp		SIMachineFunctionInfo.cpp
SIMachineScheduler.cpp		SIMachineScheduler.cpp
SIMemoryLegalizer.cpp		SIMemoryLegalizer.cpp
SIOptimizeExecMasking.cpp		SIOptimizeExecMasking.cpp
SIOptimizeExecMaskingPreRA.cpp		SIOptimizeExecMaskingPreRA.cpp
SIPeepholeSDWA.cpp		SIPeepholeSDWA.cpp
		SIPostRABundler.cpp
SIRegisterInfo.cpp		SIRegisterInfo.cpp
SIRemoveShortExecBranches.cpp		SIRemoveShortExecBranches.cpp
SIShrinkInstructions.cpp		SIShrinkInstructions.cpp
SIWholeQuadMode.cpp		SIWholeQuadMode.cpp
GCNILPSched.cpp		GCNILPSched.cpp
GCNRegBankReassign.cpp		GCNRegBankReassign.cpp
GCNNSAReassign.cpp		GCNNSAReassign.cpp
GCNDPPCombine.cpp		GCNDPPCombine.cpp
SIModeRegister.cpp		SIModeRegister.cpp
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)
add_subdirectory(Utils)		add_subdirectory(Utils)

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 1,543 Lines • ▼ Show 20 Lines	case AMDGPU::ENTER_WWM: {
break;		break;
}		}
case AMDGPU::EXIT_WWM: {		case AMDGPU::EXIT_WWM: {
// This only gets its own opcode so that SIPreAllocateWWMRegs can tell when		// This only gets its own opcode so that SIPreAllocateWWMRegs can tell when
// WWM is exited.		// WWM is exited.
MI.setDesc(get(ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64));		MI.setDesc(get(ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64));
break;		break;
}		}
case TargetOpcode::BUNDLE: {
if (!MI.mayLoad() \|\| MI.hasUnmodeledSideEffects())
return false;

// If it is a load it must be a memory clause
for (MachineBasicBlock::instr_iterator I = MI.getIterator();
I->isBundledWithSucc(); ++I) {
I->unbundleFromSucc();
for (MachineOperand &MO : I->operands())
if (MO.isReg())
MO.setIsInternalRead(false);
}

MI.eraseFromParent();
break;
}
}		}
return true;		return true;
}		}

std::pair<MachineInstr, MachineInstr>		std::pair<MachineInstr, MachineInstr>
SIInstrInfo::expandMovDPP64(MachineInstr &MI) const {		SIInstrInfo::expandMovDPP64(MachineInstr &MI) const {
assert (MI.getOpcode() == AMDGPU::V_MOV_B64_DPP_PSEUDO);		assert (MI.getOpcode() == AMDGPU::V_MOV_B64_DPP_PSEUDO);

▲ Show 20 Lines • Show All 5,071 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp

	Show First 20 Lines • Show All 1,283 Lines • ▼ Show 20 Lines
	bool SIMemoryLegalizer::runOnMachineFunction(MachineFunction &MF) {			bool SIMemoryLegalizer::runOnMachineFunction(MachineFunction &MF) {
	bool Changed = false;			bool Changed = false;

	SIMemOpAccess MOA(MF);			SIMemOpAccess MOA(MF);
	CC = SICacheControl::create(MF.getSubtarget<GCNSubtarget>());			CC = SICacheControl::create(MF.getSubtarget<GCNSubtarget>());

	for (auto &MBB : MF) {			for (auto &MBB : MF) {
	for (auto MI = MBB.begin(); MI != MBB.end(); ++MI) {			for (auto MI = MBB.begin(); MI != MBB.end(); ++MI) {

				if (MI->getOpcode() == TargetOpcode::BUNDLE && MI->mayLoadOrStore()) {
				MachineBasicBlock::instr_iterator II(MI->getIterator());
				for (MachineBasicBlock::instr_iterator I = ++II, E = MBB.instr_end();
				I != E && I->isBundledWithPred(); ++I) {
				I->unbundleFromPred();
				for (MachineOperand &MO : I->operands())
				if (MO.isReg())
				MO.setIsInternalRead(false);
				vpykhtinUnsubmitted Not Done Reply Inline Actions just out of curiosity: why is this needed? vpykhtin: just out of curiosity: why is this needed?
				rampitecAuthorUnsubmitted Done Reply Inline Actions We need to set it when we forming bundle to model the fact a single bundle instruction reads a register which is defined inside the same instruction. Therefore, when we unpack the bundle this flag needs to be reversed. rampitec: We need to set it when we forming bundle to model the fact a single bundle instruction reads a…
				foadUnsubmitted Not Done Reply Inline Actions You mean when the address of one load depends on the result of another load? Surely we shouldn't be bundling loads that are dependent like that? foad: You mean when the address of one load depends on the result of another load? Surely we…
				rampitecAuthorUnsubmitted Done Reply Inline Actions Yes, we do not. But I really cannot rely on that when unpacking. rampitec: Yes, we do not. But I really cannot rely on that when unpacking.
				}

				MI->eraseFromParent();
				MI = II->getIterator();
				}

	if (!(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic))			if (!(MI->getDesc().TSFlags & SIInstrFlags::maybeAtomic))
	continue;			continue;

	if (const auto &MOI = MOA.getLoadInfo(MI))			if (const auto &MOI = MOA.getLoadInfo(MI))
	Changed \|= expandLoad(MOI.getValue(), MI);			Changed \|= expandLoad(MOI.getValue(), MI);
	else if (const auto &MOI = MOA.getStoreInfo(MI))			else if (const auto &MOI = MOA.getStoreInfo(MI))
	Changed \|= expandStore(MOI.getValue(), MI);			Changed \|= expandStore(MOI.getValue(), MI);
	else if (const auto &MOI = MOA.getAtomicFenceInfo(MI))			else if (const auto &MOI = MOA.getAtomicFenceInfo(MI))
	Show All 18 Lines

llvm/lib/Target/AMDGPU/SIPostRABundler.cpp

This file was added.

				//===-- SIPostRABundler.cpp -----------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// This pass creates bundles of memory instructions to protect adjacent loads
				/// and stores from beeing rescheduled apart from each other post-RA.
				///
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"
				#include "AMDGPUSubtarget.h"
				#include "SIDefines.h"
				#include "SIInstrInfo.h"
				#include "llvm/ADT/SmallSet.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBundle.h"
				#include "llvm/InitializePasses.h"

				using namespace llvm;

				#define DEBUG_TYPE "si-post-ra-bundler"

				namespace {

				class SIPostRABundler : public MachineFunctionPass {
				public:
				static char ID;

				public:
				SIPostRABundler() : MachineFunctionPass(ID) {
				initializeSIPostRABundlerPass(*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				StringRef getPassName() const override {
				return "SI post-RA bundler";
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesAll();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				private:
				const SIRegisterInfo *TRI;

				SmallSet<Register, 16> Defs;

				bool isDependentLoad(const MachineInstr &MI) const;

				};

				} // End anonymous namespace.

				INITIALIZE_PASS(SIPostRABundler, DEBUG_TYPE, "SI post-RA bundler", false, false)

				char SIPostRABundler::ID = 0;

				char &llvm::SIPostRABundlerID = SIPostRABundler::ID;

				FunctionPass *llvm::createSIPostRABundlerPass() {
				return new SIPostRABundler();
				}

				bool SIPostRABundler::isDependentLoad(const MachineInstr &MI) const {
				if (!MI.mayLoad())
				return false;

				for (const MachineOperand &Op : MI.explicit_uses()) {
				if (!Op.isReg())
				continue;
				Register Reg = Op.getReg();
				for (const Register Def : Defs)
				if (TRI->regsOverlap(Reg, Def))
				return true;
				}

				return false;
				}

				bool SIPostRABundler::runOnMachineFunction(MachineFunction &MF) {
				if (skipFunction(MF.getFunction()))
				return false;

				TRI = MF.getSubtarget<GCNSubtarget>().getRegisterInfo();
				bool Changed = false;
				const unsigned MemFlags = SIInstrFlags::MTBUF \| SIInstrFlags::MUBUF \|
				SIInstrFlags::SMRD \| SIInstrFlags::DS \|
				SIInstrFlags::FLAT \| SIInstrFlags::MIMG;

				for (MachineBasicBlock &MBB : MF) {
				MachineBasicBlock::instr_iterator Next;
				MachineBasicBlock::instr_iterator B = MBB.instr_begin();
				MachineBasicBlock::instr_iterator E = MBB.instr_end();
				for (auto I = B; I != E; I = Next) {
				Next = std::next(I);

				if (I->isBundled() \|\| !I->mayLoadOrStore() \|\|
				B->mayLoad() != I->mayLoad() \|\| B->mayStore() != I->mayStore() \|\|
				(B->getDesc().TSFlags & MemFlags) !=
				(I->getDesc().TSFlags & MemFlags) \|\|
				isDependentLoad(*I)) {

				if (B != I) {
				if (std::next(B) != I) {
				finalizeBundle(MBB, B, I);
				Changed = true;
				}
				Next = I;
				}

				B = Next;
				Defs.clear();
				continue;
				}

				if (I->getNumExplicitDefs() == 0)
				continue;

				Defs.insert(I->defs().begin()->getReg());
				}

				if (B != E && std::next(B) != E) {
				finalizeBundle(MBB, B, E);
				Changed = true;
				}

				Defs.clear();
				}

				return Changed;
				}

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

	Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b32 s7, 0xf000			; GFX9-NEXT: s_mov_b32 s7, 0xf000
	; GFX9-NEXT: s_mov_b32 s6, -1			; GFX9-NEXT: s_mov_b32 s6, -1
	; GFX9-NEXT: v_add_u32_e32 v0, s0, v0			; GFX9-NEXT: v_add_u32_e32 v0, s0, v0
	; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: add_i32_uniform:			; GFX1064-LABEL: add_i32_uniform:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0			; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
				; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c			; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c
				foadUnsubmitted Not Done Reply Inline Actions Nice. foad: Nice.
	; GFX1064-NEXT: ; implicit-def: $vgpr1			; GFX1064-NEXT: ; implicit-def: $vgpr1
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0			; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc
	; GFX1064-NEXT: s_cbranch_execz BB1_2			; GFX1064-NEXT: s_cbranch_execz BB1_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	; GFX1064-NEXT: s_bcnt1_i32_b64 s1, s[2:3]			; GFX1064-NEXT: s_bcnt1_i32_b64 s1, s[2:3]
	▲ Show 20 Lines • Show All 1,538 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b32 s7, 0xf000			; GFX9-NEXT: s_mov_b32 s7, 0xf000
	; GFX9-NEXT: s_mov_b32 s6, -1			; GFX9-NEXT: s_mov_b32 s6, -1
	; GFX9-NEXT: v_sub_u32_e32 v0, s0, v0			; GFX9-NEXT: v_sub_u32_e32 v0, s0, v0
	; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX9-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX1064-LABEL: sub_i32_uniform:			; GFX1064-LABEL: sub_i32_uniform:
	; GFX1064: ; %bb.0: ; %entry			; GFX1064: ; %bb.0: ; %entry
	; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0			; GFX1064-NEXT: v_cmp_ne_u32_e64 s[2:3], 1, 0
				; GFX1064-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c			; GFX1064-NEXT: s_load_dword s0, s[0:1], 0x2c
	; GFX1064-NEXT: ; implicit-def: $vgpr1			; GFX1064-NEXT: ; implicit-def: $vgpr1
	; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0			; GFX1064-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s2, 0
	; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0			; GFX1064-NEXT: v_mbcnt_hi_u32_b32_e64 v0, s3, v0
	; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1064-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc			; GFX1064-NEXT: s_and_saveexec_b64 s[6:7], vcc
	; GFX1064-NEXT: s_cbranch_execz BB9_2			; GFX1064-NEXT: s_cbranch_execz BB9_2
	; GFX1064-NEXT: ; %bb.1:			; GFX1064-NEXT: ; %bb.1:
	▲ Show 20 Lines • Show All 3,254 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll

	Show First 20 Lines • Show All 257 Lines • ▼ Show 20 Lines

	; GCN: v_mov_b32_e32 [[NINE:v[0-9]+]], 9			; GCN: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
	; GCN: buffer_store_dword [[NINE]], off, s[0:3], s33 offset:8			; GCN: buffer_store_dword [[NINE]], off, s[0:3], s33 offset:8
	; GCN: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13			; GCN: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13
	; GCN: buffer_store_dword [[THIRTEEN]], off, s[0:3], s33 offset:24			; GCN: buffer_store_dword [[THIRTEEN]], off, s[0:3], s33 offset:24


	; GCN-NOT: s_add_u32 s32, s32, 0x800			; GCN-NOT: s_add_u32 s32, s32, 0x800
	; GCN-DAG: s_add_u32 s32, s33, 0xc00{{$}}

	; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s33 offset:8			; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s33 offset:8
	; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s33 offset:12			; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s33 offset:12
	; GCN: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s33 offset:16			; GCN: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s33 offset:16
	; GCN: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s33 offset:20			; GCN: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s33 offset:20

				; GCN-NOT: s_add_u32 s32, s32, 0x800
				; GCN-DAG: s_add_u32 s32, s33, 0xc00{{$}}

	; GCN: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12			; GCN: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:12
	; GCN: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8			; GCN: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:8
	; GCN: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4			; GCN: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:4
	; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}			; GCN: buffer_store_dword [[LOAD0]], off, s[0:3], s32{{$}}


	; GCN-DAG: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s33 offset:24			; GCN-DAG: buffer_load_dword [[LOAD4:v[0-9]+]], off, s[0:3], s33 offset:24
	; GCN-DAG: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s33 offset:28			; GCN-DAG: buffer_load_dword [[LOAD5:v[0-9]+]], off, s[0:3], s33 offset:28
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

	Show First 20 Lines • Show All 662 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @test_call_external_void_func_struct_i8_i32() #0 {			define amdgpu_kernel void @test_call_external_void_func_struct_i8_i32() #0 {
	%ptr0 = load { i8, i32 } addrspace(1), { i8, i32 } addrspace(1) addrspace(4)* undef			%ptr0 = load { i8, i32 } addrspace(1), { i8, i32 } addrspace(1) addrspace(4)* undef
	%val = load { i8, i32 }, { i8, i32 } addrspace(1)* %ptr0			%val = load { i8, i32 }, { i8, i32 } addrspace(1)* %ptr0
	call void @external_void_func_struct_i8_i32({ i8, i32 } %val)			call void @external_void_func_struct_i8_i32({ i8, i32 } %val)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_call_external_void_func_byval_struct_i8_i32:			; GCN-LABEL: {{^}}test_call_external_void_func_byval_struct_i8_i32:
	; GCN-DAG: s_add_u32 [[SP:s[0-9]+]], s33, 0x400{{$}}

	; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3			; GCN-DAG: v_mov_b32_e32 [[VAL0:v[0-9]+]], 3
	; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8			; GCN-DAG: v_mov_b32_e32 [[VAL1:v[0-9]+]], 8
	; MESA-DAG: buffer_store_byte [[VAL0]], off, s[36:39], s33 offset:8			; MESA-DAG: buffer_store_byte [[VAL0]], off, s[36:39], s33 offset:8
	; MESA-DAG: buffer_store_dword [[VAL1]], off, s[36:39], s33 offset:12			; MESA-DAG: buffer_store_dword [[VAL1]], off, s[36:39], s33 offset:12

	; HSA-DAG: buffer_store_byte [[VAL0]], off, s[0:3], s33 offset:8			; HSA-DAG: buffer_store_byte [[VAL0]], off, s[0:3], s33 offset:8
	; HSA-DAG: buffer_store_dword [[VAL1]], off, s[0:3], s33 offset:12			; HSA-DAG: buffer_store_dword [[VAL1]], off, s[0:3], s33 offset:12

	; GCN-NOT: s_add_u32 [[SP]],

	; HSA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[0:3], s33 offset:8			; HSA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[0:3], s33 offset:8
	; HSA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[0:3], s33 offset:12			; HSA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[0:3], s33 offset:12

	; HSA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[0:3], [[SP]]{{$}}
	; HSA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[0:3], [[SP]] offset:4


	; MESA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[36:39], s33 offset:8			; MESA: buffer_load_dword [[RELOAD_VAL0:v[0-9]+]], off, s[36:39], s33 offset:8
	; MESA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[36:39], s33 offset:12			; MESA: buffer_load_dword [[RELOAD_VAL1:v[0-9]+]], off, s[36:39], s33 offset:12

				; GCN-DAG: s_add_u32 [[SP:s[0-9]+]], s33, 0x400{{$}}

				; HSA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[0:3], [[SP]]{{$}}
				; HSA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[0:3], [[SP]] offset:4

	; MESA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[36:39], [[SP]]{{$}}			; MESA-DAG: buffer_store_dword [[RELOAD_VAL0]], off, s[36:39], [[SP]]{{$}}
	; MESA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[36:39], [[SP]] offset:4			; MESA-DAG: buffer_store_dword [[RELOAD_VAL1]], off, s[36:39], [[SP]] offset:4

	; GCN-NEXT: s_swappc_b64			; GCN-NEXT: s_swappc_b64
	; GCN-NOT: [[SP]]			; GCN-NOT: [[SP]]
	define amdgpu_kernel void @test_call_external_void_func_byval_struct_i8_i32() #0 {			define amdgpu_kernel void @test_call_external_void_func_byval_struct_i8_i32() #0 {
	%val = alloca { i8, i32 }, align 4, addrspace(5)			%val = alloca { i8, i32 }, align 4, addrspace(5)
	%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 0
	▲ Show 20 Lines • Show All 246 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

Show First 20 Lines • Show All 394 Lines • ▼ Show 20 Lines	call void @too_many_args_use_workitem_id_x(
i32 290, i32 300, i32 310, i32 320)		i32 290, i32 300, i32 310, i32 320)
ret void		ret void
}		}

; Requires loading and storing to stack slot.		; Requires loading and storing to stack slot.
; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:		; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:
; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}		; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}
; GCN-DAG: buffer_store_dword v32, off, s[0:3], s34 offset:4 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v32, off, s[0:3], s34 offset:4 ; 4-byte Folded Spill
; GCN: buffer_load_dword v32, off, s[0:3], s34{{$}}		; GCN-DAG: buffer_load_dword v32, off, s[0:3], s34{{$}}

; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}		; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN: buffer_load_dword v32, off, s[0:3], s34 offset:4 ; 4-byte Folded Reload		; GCN: buffer_load_dword v32, off, s[0:3], s34 offset:4 ; 4-byte Folded Reload
; GCN: s_sub_u32 s32, s32, 0x400{{$}}		; GCN: s_sub_u32 s32, s32, 0x400{{$}}
; GCN: s_setpc_b64		; GCN: s_setpc_b64
▲ Show 20 Lines • Show All 328 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; SI-NEXT: s_mov_b64 s[12:13], s[6:7]			; SI-NEXT: s_mov_b64 s[12:13], s[6:7]
	; SI-NEXT: v_mov_b32_e32 v1, 0			; SI-NEXT: v_mov_b32_e32 v1, 0
	; SI-NEXT: buffer_load_dword v0, v[0:1], s[12:15], 0 addr64			; SI-NEXT: buffer_load_dword v0, v[0:1], s[12:15], 0 addr64
	; SI-NEXT: s_mov_b32 s10, -1			; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_mov_b32 s8, s4			; SI-NEXT: s_mov_b32 s8, s4
	; SI-NEXT: s_mov_b32 s9, s5			; SI-NEXT: s_mov_b32 s9, s5
	; SI-NEXT: s_mov_b32 s4, s2			; SI-NEXT: s_mov_b32 s4, s2
	; SI-NEXT: s_mov_b32 s5, s3			; SI-NEXT: s_mov_b32 s5, s3
	; SI-NEXT: s_mov_b32 s6, s10
	; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: s_mov_b32 s2, s10			; SI-NEXT: s_mov_b32 s2, s10
	; SI-NEXT: s_mov_b32 s3, s11			; SI-NEXT: s_mov_b32 s3, s11
				; SI-NEXT: s_mov_b32 s6, s10
				; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0			; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; SI-NEXT: buffer_store_dword v0, off, s[8:11], 0			; SI-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: test_copy_v4i8_x3:			; VI-LABEL: test_copy_v4i8_x3:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; VI-NEXT: s_mov_b32 s11, 0xf000			; VI-NEXT: s_mov_b32 s11, 0xf000
	; VI-NEXT: s_mov_b32 s10, -1			; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_mov_b32 s14, s10			; VI-NEXT: s_mov_b32 s14, s10
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v1, s7			; VI-NEXT: v_mov_b32_e32 v1, s7
	; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v0			; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v0
	; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; VI-NEXT: flat_load_dword v0, v[0:1]			; VI-NEXT: flat_load_dword v0, v[0:1]
	; VI-NEXT: s_mov_b32 s12, s2			; VI-NEXT: s_mov_b32 s12, s2
	; VI-NEXT: s_mov_b32 s13, s3			; VI-NEXT: s_mov_b32 s13, s3
				; VI-NEXT: s_mov_b32 s2, s10
				; VI-NEXT: s_mov_b32 s3, s11
	; VI-NEXT: s_mov_b32 s8, s4			; VI-NEXT: s_mov_b32 s8, s4
	; VI-NEXT: s_mov_b32 s9, s5			; VI-NEXT: s_mov_b32 s9, s5
	; VI-NEXT: s_mov_b32 s15, s11			; VI-NEXT: s_mov_b32 s15, s11
	; VI-NEXT: s_mov_b32 s2, s10
	; VI-NEXT: s_mov_b32 s3, s11
	; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; VI-NEXT: buffer_store_dword v0, off, s[12:15], 0			; VI-NEXT: buffer_store_dword v0, off, s[12:15], 0
	; VI-NEXT: buffer_store_dword v0, off, s[8:11], 0			; VI-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%tid.x = call i32 @llvm.amdgcn.workitem.id.x()			%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x			%gep = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x
	%val = load <4 x i8>, <4 x i8> addrspace(1)* %gep, align 4			%val = load <4 x i8>, <4 x i8> addrspace(1)* %gep, align 4
	Show All 14 Lines
	; SI-NEXT: v_mov_b32_e32 v1, 0			; SI-NEXT: v_mov_b32_e32 v1, 0
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: buffer_load_dword v0, v[0:1], s[8:11], 0 addr64			; SI-NEXT: buffer_load_dword v0, v[0:1], s[8:11], 0 addr64
	; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s14, -1			; SI-NEXT: s_mov_b32 s14, -1
	; SI-NEXT: s_mov_b32 s18, s14			; SI-NEXT: s_mov_b32 s18, s14
	; SI-NEXT: s_mov_b32 s19, s15			; SI-NEXT: s_mov_b32 s19, s15
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s16, s2			; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s17, s3			; SI-NEXT: s_mov_b32 s17, s3
	; SI-NEXT: s_mov_b32 s6, s14
	; SI-NEXT: s_mov_b32 s7, s15
	; SI-NEXT: s_mov_b32 s2, s14			; SI-NEXT: s_mov_b32 s2, s14
	; SI-NEXT: s_mov_b32 s3, s15			; SI-NEXT: s_mov_b32 s3, s15
				; SI-NEXT: s_mov_b32 s12, s6
				; SI-NEXT: s_mov_b32 s13, s7
				; SI-NEXT: s_mov_b32 s6, s14
				; SI-NEXT: s_mov_b32 s7, s15
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; SI-NEXT: buffer_store_dword v0, off, s[16:19], 0			; SI-NEXT: buffer_store_dword v0, off, s[16:19], 0
	; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0			; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; SI-NEXT: buffer_store_dword v0, off, s[12:15], 0			; SI-NEXT: buffer_store_dword v0, off, s[12:15], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: test_copy_v4i8_x4:			; VI-LABEL: test_copy_v4i8_x4:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x44			; VI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x44
	; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; VI-NEXT: s_mov_b32 s11, 0xf000			; VI-NEXT: s_mov_b32 s11, 0xf000
	; VI-NEXT: s_mov_b32 s10, -1			; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v1, s9			; VI-NEXT: v_mov_b32_e32 v1, s9
	; VI-NEXT: v_add_u32_e32 v0, vcc, s8, v0			; VI-NEXT: v_add_u32_e32 v0, vcc, s8, v0
	; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; VI-NEXT: flat_load_dword v0, v[0:1]			; VI-NEXT: flat_load_dword v0, v[0:1]
	; VI-NEXT: s_mov_b32 s8, s6
	; VI-NEXT: s_mov_b32 s9, s7
	; VI-NEXT: s_mov_b32 s12, s2			; VI-NEXT: s_mov_b32 s12, s2
	; VI-NEXT: s_mov_b32 s13, s3			; VI-NEXT: s_mov_b32 s13, s3
				; VI-NEXT: s_mov_b32 s2, s10
				; VI-NEXT: s_mov_b32 s3, s11
				; VI-NEXT: s_mov_b32 s8, s6
				; VI-NEXT: s_mov_b32 s9, s7
	; VI-NEXT: s_mov_b32 s6, s10			; VI-NEXT: s_mov_b32 s6, s10
	; VI-NEXT: s_mov_b32 s7, s11			; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: s_mov_b32 s14, s10			; VI-NEXT: s_mov_b32 s14, s10
	; VI-NEXT: s_mov_b32 s15, s11			; VI-NEXT: s_mov_b32 s15, s11
	; VI-NEXT: s_mov_b32 s2, s10
	; VI-NEXT: s_mov_b32 s3, s11
	; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; VI-NEXT: buffer_store_dword v0, off, s[12:15], 0			; VI-NEXT: buffer_store_dword v0, off, s[12:15], 0
	; VI-NEXT: buffer_store_dword v0, off, s[4:7], 0			; VI-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; VI-NEXT: buffer_store_dword v0, off, s[8:11], 0			; VI-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%tid.x = call i32 @llvm.amdgcn.workitem.id.x()			%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x			%gep = getelementptr <4 x i8>, <4 x i8> addrspace(1)* %in, i32 %tid.x
	▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

	Show First 20 Lines • Show All 268 Lines • ▼ Show 20 Lines
	; SI-LABEL: load_v4i8_to_v4f32_2_uses:			; SI-LABEL: load_v4i8_to_v4f32_2_uses:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; SI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s3, 0xf000
	; SI-NEXT: s_mov_b32 s6, 0			; SI-NEXT: s_mov_b32 s6, 0
	; SI-NEXT: s_mov_b32 s7, s3			; SI-NEXT: s_mov_b32 s7, s3
	; SI-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; SI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; SI-NEXT: v_mov_b32_e32 v1, 0			; SI-NEXT: v_mov_b32_e32 v1, 0
				; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9
				; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb
				foadUnsubmitted Not Done Reply Inline Actions Nice. foad: Nice.
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: buffer_load_dword v1, v[0:1], s[4:7], 0 addr64			; SI-NEXT: buffer_load_dword v1, v[0:1], s[4:7], 0 addr64
	; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: s_movk_i32 s12, 0xff			; SI-NEXT: s_movk_i32 s12, 0xff
	; SI-NEXT: s_mov_b32 s10, s2			; SI-NEXT: s_mov_b32 s10, s2
	; SI-NEXT: s_mov_b32 s11, s3			; SI-NEXT: s_mov_b32 s11, s3
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_lshrrev_b32_e32 v4, 16, v1			; SI-NEXT: v_lshrrev_b32_e32 v4, 16, v1
	; SI-NEXT: v_add_i32_e32 v7, vcc, 9, v1			; SI-NEXT: v_add_i32_e32 v7, vcc, 9, v1
	; SI-NEXT: v_and_b32_e32 v6, 0xff00, v1			; SI-NEXT: v_and_b32_e32 v6, 0xff00, v1
	; SI-NEXT: v_lshrrev_b32_e32 v5, 24, v1			; SI-NEXT: v_lshrrev_b32_e32 v5, 24, v1
	; SI-NEXT: v_cvt_f32_ubyte3_e32 v3, v1			; SI-NEXT: v_cvt_f32_ubyte3_e32 v3, v1
	; SI-NEXT: v_cvt_f32_ubyte2_e32 v2, v1			; SI-NEXT: v_cvt_f32_ubyte2_e32 v2, v1
	; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v1			; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v1
	; SI-NEXT: v_cvt_f32_ubyte1_e32 v1, v6			; SI-NEXT: v_cvt_f32_ubyte1_e32 v1, v6
	; SI-NEXT: v_and_b32_e32 v7, s12, v7			; SI-NEXT: v_and_b32_e32 v7, s12, v7
	; SI-NEXT: v_add_i32_e32 v4, vcc, 9, v4			; SI-NEXT: v_add_i32_e32 v4, vcc, 9, v4
	; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0
	; SI-NEXT: s_waitcnt expcnt(0)			; SI-NEXT: s_waitcnt expcnt(0)
	; SI-NEXT: v_or_b32_e32 v0, v6, v7			; SI-NEXT: v_or_b32_e32 v0, v6, v7
	; SI-NEXT: v_lshlrev_b32_e32 v5, 8, v5			; SI-NEXT: v_lshlrev_b32_e32 v5, 8, v5
	; SI-NEXT: v_and_b32_e32 v1, s12, v4			; SI-NEXT: v_and_b32_e32 v1, s12, v4
	; SI-NEXT: v_add_i32_e32 v0, vcc, 0x900, v0			; SI-NEXT: v_add_i32_e32 v0, vcc, 0x900, v0
	; SI-NEXT: v_or_b32_e32 v1, v5, v1			; SI-NEXT: v_or_b32_e32 v1, v5, v1
	; SI-NEXT: v_and_b32_e32 v0, 0xffff, v0			; SI-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; SI-NEXT: v_or_b32_e32 v0, v1, v0			; SI-NEXT: v_or_b32_e32 v0, v1, v0
	; SI-NEXT: v_add_i32_e32 v0, vcc, 0x9000000, v0			; SI-NEXT: v_add_i32_e32 v0, vcc, 0x9000000, v0
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: load_v4i8_to_v4f32_2_uses:			; VI-LABEL: load_v4i8_to_v4f32_2_uses:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34			; VI-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x34
	; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; VI-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x2c			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x2c
	; VI-NEXT: v_mov_b32_e32 v4, 9			; VI-NEXT: v_mov_b32_e32 v4, 9
				; VI-NEXT: s_movk_i32 s8, 0x900
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v1, s3			; VI-NEXT: v_mov_b32_e32 v1, s3
	; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v0			; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v0
	; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; VI-NEXT: flat_load_dword v5, v[0:1]			; VI-NEXT: flat_load_dword v5, v[0:1]
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
	; VI-NEXT: s_movk_i32 s8, 0x900
	; VI-NEXT: v_mov_b32_e32 v6, s8			; VI-NEXT: v_mov_b32_e32 v6, s8
	; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; VI-NEXT: v_lshrrev_b32_e32 v7, 24, v5			; VI-NEXT: v_lshrrev_b32_e32 v7, 24, v5
	; VI-NEXT: v_cvt_f32_ubyte3_e32 v3, v5			; VI-NEXT: v_cvt_f32_ubyte3_e32 v3, v5
	; VI-NEXT: v_cvt_f32_ubyte2_e32 v2, v5			; VI-NEXT: v_cvt_f32_ubyte2_e32 v2, v5
	; VI-NEXT: v_cvt_f32_ubyte1_e32 v1, v5			; VI-NEXT: v_cvt_f32_ubyte1_e32 v1, v5
	; VI-NEXT: v_cvt_f32_ubyte0_e32 v0, v5			; VI-NEXT: v_cvt_f32_ubyte0_e32 v0, v5
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v0			; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v0
	; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:24			; SI-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:24
	; SI-NEXT: s_waitcnt expcnt(0)			; SI-NEXT: s_waitcnt expcnt(0)
	; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v2			; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v2
	; SI-NEXT: v_or_b32_e32 v0, v0, v1			; SI-NEXT: v_or_b32_e32 v0, v0, v1
	; SI-NEXT: v_or_b32_e32 v4, v3, v6			; SI-NEXT: v_or_b32_e32 v4, v3, v6
	; SI-NEXT: v_and_b32_e32 v5, 0xffff0000, v0			; SI-NEXT: v_and_b32_e32 v5, 0xffff0000, v0
	; SI-NEXT: v_or_b32_e32 v4, v4, v5			; SI-NEXT: v_or_b32_e32 v4, v4, v5
	; SI-NEXT: v_cvt_f32_ubyte1_e32 v5, v4
	; SI-NEXT: v_cvt_f32_ubyte3_e32 v3, v0			; SI-NEXT: v_cvt_f32_ubyte3_e32 v3, v0
	; SI-NEXT: v_cvt_f32_ubyte2_e32 v2, v0			; SI-NEXT: v_cvt_f32_ubyte2_e32 v2, v0
	; SI-NEXT: v_cvt_f32_ubyte1_e32 v1, v0			; SI-NEXT: v_cvt_f32_ubyte1_e32 v1, v0
	; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v0			; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v0
				; SI-NEXT: v_cvt_f32_ubyte1_e32 v5, v4
	; SI-NEXT: v_cvt_f32_ubyte0_e32 v4, v4			; SI-NEXT: v_cvt_f32_ubyte0_e32 v4, v4
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0
	; SI-NEXT: buffer_store_dwordx2 v[4:5], off, s[4:7], 0 offset:16			; SI-NEXT: buffer_store_dwordx2 v[4:5], off, s[4:7], 0 offset:16
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: load_v7i8_to_v7f32:			; VI-LABEL: load_v7i8_to_v7f32:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x2c			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x2c
	; VI-NEXT: v_lshlrev_b32_e32 v0, 3, v0			; VI-NEXT: v_lshlrev_b32_e32 v0, 3, v0
	; VI-NEXT: s_mov_b32 s7, 0xf000			; VI-NEXT: s_mov_b32 s7, 0xf000
	; VI-NEXT: s_mov_b32 s6, -1			; VI-NEXT: s_mov_b32 s6, -1
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v1, s1			; VI-NEXT: v_mov_b32_e32 v1, s1
	; VI-NEXT: v_add_u32_e32 v0, vcc, s0, v0			; VI-NEXT: v_add_u32_e32 v0, vcc, s0, v0
	; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; VI-NEXT: v_add_u32_e32 v2, vcc, 1, v0			; VI-NEXT: v_add_u32_e32 v2, vcc, 1, v0
	; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v0			; VI-NEXT: v_add_u32_e32 v4, vcc, 3, v0
	; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v1, vcc
				; VI-NEXT: flat_load_ubyte v8, v[0:1]
	; VI-NEXT: flat_load_ubyte v9, v[2:3]			; VI-NEXT: flat_load_ubyte v9, v[2:3]
	; VI-NEXT: flat_load_ubyte v10, v[4:5]			; VI-NEXT: flat_load_ubyte v10, v[4:5]
	; VI-NEXT: v_add_u32_e32 v2, vcc, 2, v0			; VI-NEXT: v_add_u32_e32 v2, vcc, 2, v0
	; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	; VI-NEXT: v_add_u32_e32 v4, vcc, 5, v0			; VI-NEXT: v_add_u32_e32 v4, vcc, 5, v0
	; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v5, vcc, 0, v1, vcc
	; VI-NEXT: v_add_u32_e32 v6, vcc, 4, v0			; VI-NEXT: v_add_u32_e32 v6, vcc, 4, v0
	; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v7, vcc, 0, v1, vcc
	; VI-NEXT: flat_load_ubyte v8, v[0:1]
	; VI-NEXT: v_add_u32_e32 v0, vcc, 6, v0			; VI-NEXT: v_add_u32_e32 v0, vcc, 6, v0
	; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc			; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
	; VI-NEXT: flat_load_ubyte v2, v[2:3]			; VI-NEXT: flat_load_ubyte v2, v[2:3]
	; VI-NEXT: flat_load_ubyte v3, v[4:5]			; VI-NEXT: flat_load_ubyte v3, v[4:5]
	; VI-NEXT: flat_load_ubyte v4, v[6:7]			; VI-NEXT: flat_load_ubyte v4, v[6:7]
	; VI-NEXT: flat_load_ubyte v0, v[0:1]			; VI-NEXT: flat_load_ubyte v0, v[0:1]
	; VI-NEXT: s_waitcnt vmcnt(6) lgkmcnt(6)
	; VI-NEXT: v_lshlrev_b32_e32 v1, 8, v9
	; VI-NEXT: s_waitcnt vmcnt(5) lgkmcnt(5)			; VI-NEXT: s_waitcnt vmcnt(5) lgkmcnt(5)
				; VI-NEXT: v_lshlrev_b32_e32 v1, 8, v9
				; VI-NEXT: s_waitcnt vmcnt(4) lgkmcnt(4)
	; VI-NEXT: v_lshlrev_b32_e32 v5, 8, v10			; VI-NEXT: v_lshlrev_b32_e32 v5, 8, v10
	; VI-NEXT: s_waitcnt vmcnt(2) lgkmcnt(2)			; VI-NEXT: s_waitcnt vmcnt(2) lgkmcnt(2)
	; VI-NEXT: v_lshlrev_b32_e32 v3, 8, v3			; VI-NEXT: v_lshlrev_b32_e32 v3, 8, v3
	; VI-NEXT: s_waitcnt vmcnt(1) lgkmcnt(1)			; VI-NEXT: s_waitcnt vmcnt(1) lgkmcnt(1)
	; VI-NEXT: v_or_b32_e32 v4, v3, v4			; VI-NEXT: v_or_b32_e32 v4, v3, v4
	; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; VI-NEXT: v_cvt_f32_ubyte0_e32 v6, v0			; VI-NEXT: v_cvt_f32_ubyte0_e32 v6, v0
	; VI-NEXT: v_or_b32_e32 v0, v1, v8			; VI-NEXT: v_or_b32_e32 v0, v1, v8
	; VI-NEXT: v_or_b32_sdwa v1, v5, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v1, v5, v2 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
	; VI-NEXT: v_or_b32_e32 v0, v1, v0			; VI-NEXT: v_or_b32_e32 v0, v1, v0
	; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v0			; VI-NEXT: v_and_b32_e32 v5, 0xffff0000, v0
	; VI-NEXT: v_or_b32_e32 v4, v4, v5			; VI-NEXT: v_or_b32_e32 v4, v4, v5
	; VI-NEXT: v_cvt_f32_ubyte1_e32 v5, v4
	; VI-NEXT: v_cvt_f32_ubyte3_e32 v3, v0			; VI-NEXT: v_cvt_f32_ubyte3_e32 v3, v0
	; VI-NEXT: v_cvt_f32_ubyte2_e32 v2, v0			; VI-NEXT: v_cvt_f32_ubyte2_e32 v2, v0
	; VI-NEXT: v_cvt_f32_ubyte1_e32 v1, v0			; VI-NEXT: v_cvt_f32_ubyte1_e32 v1, v0
	; VI-NEXT: v_cvt_f32_ubyte0_e32 v0, v0			; VI-NEXT: v_cvt_f32_ubyte0_e32 v0, v0
				; VI-NEXT: v_cvt_f32_ubyte1_e32 v5, v4
	; VI-NEXT: v_cvt_f32_ubyte0_e32 v4, v4			; VI-NEXT: v_cvt_f32_ubyte0_e32 v4, v4
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0
	; VI-NEXT: buffer_store_dwordx3 v[4:6], off, s[4:7], 0 offset:16			; VI-NEXT: buffer_store_dwordx3 v[4:6], off, s[4:7], 0 offset:16
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr <7 x i8>, <7 x i8> addrspace(1)* %in, i32 %tid			%gep = getelementptr <7 x i8>, <7 x i8> addrspace(1)* %in, i32 %tid
	%load = load <7 x i8>, <7 x i8> addrspace(1)* %gep, align 1			%load = load <7 x i8>, <7 x i8> addrspace(1)* %gep, align 1
	%cvt = uitofp <7 x i8> %load to <7 x float>			%cvt = uitofp <7 x i8> %load to <7 x float>
	▲ Show 20 Lines • Show All 497 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/ds_write2st64.ll

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines

	; CI-DAG: buffer_load_dword [[VAL0:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; CI-DAG: buffer_load_dword [[VAL0:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; CI-DAG: buffer_load_dword [[VAL1:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4			; CI-DAG: buffer_load_dword [[VAL1:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:4

	; GFX9-DAG: global_load_dword [[VAL0:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}{{$}}			; GFX9-DAG: global_load_dword [[VAL0:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}{{$}}
	; GFX9-DAG: global_load_dword [[VAL1:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} offset:4			; GFX9-DAG: global_load_dword [[VAL1:v[0-9]+]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} offset:4

	; GCN-DAG: v_lshlrev_b32_e32 [[SHL:v[0-9]+]], 2, v{{[0-9]+}}			; GCN-DAG: v_lshlrev_b32_e32 [[SHL:v[0-9]+]], 2, v{{[0-9]+}}
	; GCN: v_add_{{i\|u}}32_e32 [[VPTR:v[0-9]+]], {{(vcc, )?}}s{{[0-9]+}}, [[SHL]]			; GCN-DAG: v_add_{{i\|u}}32_e32 [[VPTR:v[0-9]+]], {{(vcc, )?}}s{{[0-9]+}}, [[SHL]]
	; GCN: ds_write2st64_b32 [[VPTR]], [[VAL0]], [[VAL1]] offset1:255			; GCN: ds_write2st64_b32 [[VPTR]], [[VAL0]], [[VAL1]] offset1:255
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @simple_write2st64_two_val_max_offset_f32(float addrspace(1)* %C, float addrspace(1)* %in, float addrspace(3)* %lds) #0 {			define amdgpu_kernel void @simple_write2st64_two_val_max_offset_f32(float addrspace(1)* %C, float addrspace(1)* %in, float addrspace(3)* %lds) #0 {
	%x.i = tail call i32 @llvm.amdgcn.workitem.id.x() #1			%x.i = tail call i32 @llvm.amdgcn.workitem.id.x() #1
	%in.gep.0 = getelementptr float, float addrspace(1)* %in, i32 %x.i			%in.gep.0 = getelementptr float, float addrspace(1)* %in, i32 %x.i
	%in.gep.1 = getelementptr float, float addrspace(1)* %in.gep.0, i32 1			%in.gep.1 = getelementptr float, float addrspace(1)* %in.gep.0, i32 1
	%val0 = load volatile float, float addrspace(1)* %in.gep.0, align 4			%val0 = load volatile float, float addrspace(1)* %in.gep.0, align 4
	%val1 = load volatile float, float addrspace(1)* %in.gep.1, align 4			%val1 = load volatile float, float addrspace(1)* %in.gep.1, align 4
	Show All 11 Lines

	; CI-DAG: buffer_load_dwordx2 [[VAL0:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}			; CI-DAG: buffer_load_dwordx2 [[VAL0:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; CI-DAG: buffer_load_dwordx2 [[VAL1:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:8			; CI-DAG: buffer_load_dwordx2 [[VAL1:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:8

	; GFX9-DAG: global_load_dwordx2 [[VAL0:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}{{$}}			; GFX9-DAG: global_load_dwordx2 [[VAL0:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}{{$}}
	; GFX9-DAG: global_load_dwordx2 [[VAL1:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} offset:8			; GFX9-DAG: global_load_dwordx2 [[VAL1:v\[[0-9]+:[0-9]+\]]], {{v\[[0-9]+:[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}} offset:8

	; GCN-DAG: v_lshlrev_b32_e32 [[SHL:v[0-9]+]], 3, v{{[0-9]+}}			; GCN-DAG: v_lshlrev_b32_e32 [[SHL:v[0-9]+]], 3, v{{[0-9]+}}
	; GCN: v_add_{{i\|u}}32_e32 [[VPTR:v[0-9]+]], {{(vcc, )?}}s{{[0-9]+}}, [[SHL]]			; GCN-DAG: v_add_{{i\|u}}32_e32 [[VPTR:v[0-9]+]], {{(vcc, )?}}s{{[0-9]+}}, [[SHL]]
	; GCN: ds_write2st64_b64 [[VPTR]], [[VAL0]], [[VAL1]] offset0:4 offset1:127			; GCN: ds_write2st64_b64 [[VPTR]], [[VAL0]], [[VAL1]] offset0:4 offset1:127
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @simple_write2st64_two_val_max_offset_f64(double addrspace(1)* %C, double addrspace(1)* %in, double addrspace(3)* %lds) #0 {			define amdgpu_kernel void @simple_write2st64_two_val_max_offset_f64(double addrspace(1)* %C, double addrspace(1)* %in, double addrspace(3)* %lds) #0 {
	%x.i = tail call i32 @llvm.amdgcn.workitem.id.x() #1			%x.i = tail call i32 @llvm.amdgcn.workitem.id.x() #1
	%in.gep.0 = getelementptr double, double addrspace(1)* %in, i32 %x.i			%in.gep.0 = getelementptr double, double addrspace(1)* %in, i32 %x.i
	%in.gep.1 = getelementptr double, double addrspace(1)* %in.gep.0, i32 1			%in.gep.1 = getelementptr double, double addrspace(1)* %in.gep.0, i32 1
	%val0 = load volatile double, double addrspace(1)* %in.gep.0, align 8			%val0 = load volatile double, double addrspace(1)* %in.gep.0, align 8
	%val1 = load volatile double, double addrspace(1)* %in.gep.1, align 8			%val1 = load volatile double, double addrspace(1)* %in.gep.1, align 8
	Show All 34 Lines

llvm/test/CodeGen/AMDGPU/global-saddr.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefix=GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefix=GFX9 %s

	; Test for a conv2d like sequence of loads.			; Test for a conv2d like sequence of loads.

	; GFX9: global_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:16{{$}}			; GFX9: global_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:16{{$}}
	; GFX9: global_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}{{$}}			; GFX9: global_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}{{$}}
	; GFX9: global_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:32{{$}}
	; GFX9: global_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:-16{{$}}			; GFX9: global_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:-16{{$}}
				; GFX9: global_load_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:32{{$}}
	; GFX9: global_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:-32{{$}}			; GFX9: global_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:-32{{$}}
	; GFX9: global_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:8{{$}}			; GFX9: global_store_dwordx2 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:8{{$}}

	define hidden amdgpu_kernel void @simpleSaddrs(i64 addrspace(1)* %dst_image, i64 addrspace(1)* %src_image ) {			define hidden amdgpu_kernel void @simpleSaddrs(i64 addrspace(1)* %dst_image, i64 addrspace(1)* %src_image ) {
	entry:			entry:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%idx = zext i32 %id to i64			%idx = zext i32 %id to i64
	%gep = getelementptr i64, i64 addrspace(1)* %src_image, i64 %idx			%gep = getelementptr i64, i64 addrspace(1)* %src_image, i64 %idx
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/half.ll

	Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines
	; GCN: flat_load_dwordx4			; GCN: flat_load_dwordx4
	; GCN: flat_load_dwordx4			; GCN: flat_load_dwordx4

	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32

	; GCN: flat_store_dwordx4			; GCN: flat_store_dwordx4

	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
	; SI: v_cvt_f32_f16_e32			; SI: v_cvt_f32_f16_e32
				; SI: v_cvt_f32_f16_e32

	; VI: v_cvt_f32_f16_e32			; VI: v_cvt_f32_f16_e32
	; VI: v_cvt_f32_f16_sdwa			; VI: v_cvt_f32_f16_sdwa


	; GCN: flat_store_dwordx4			; GCN: flat_store_dwordx4
	; GCN: flat_store_dwordx4			; GCN: flat_store_dwordx4
	; GCN: flat_store_dwordx4			; GCN: flat_store_dwordx4
	▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot2.ll

Show First 20 Lines • Show All 2,546 Lines • ▼ Show 20 Lines
; GFX7-LABEL: udot2_acc16:		; GFX7-LABEL: udot2_acc16:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s3, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s2, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_mov_b32 s8, 0xffff		; GFX7-NEXT: s_mov_b32 s8, 0xffff
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
		; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0		; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_lshr_b32 s6, s4, 16		; GFX7-NEXT: s_lshr_b32 s6, s4, 16
; GFX7-NEXT: s_and_b32 s4, s4, s8
; GFX7-NEXT: s_lshr_b32 s7, s5, 16		; GFX7-NEXT: s_lshr_b32 s7, s5, 16
; GFX7-NEXT: v_mov_b32_e32 v1, s7		; GFX7-NEXT: v_mov_b32_e32 v1, s7
; GFX7-NEXT: s_and_b32 s5, s5, s8		; GFX7-NEXT: s_and_b32 s5, s5, s8
		; GFX7-NEXT: s_and_b32 s4, s4, s8
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s6, v1, v0
; GFX7-NEXT: v_mov_b32_e32 v1, s5		; GFX7-NEXT: v_mov_b32_e32 v1, s5
; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0		; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0		; GFX7-NEXT: buffer_store_short v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: udot2_acc16:		; GFX8-LABEL: udot2_acc16:
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	entry:
store i16 %add2, i16 addrspace(1)* %dst, align 2		store i16 %add2, i16 addrspace(1)* %dst, align 2
ret void		ret void
}		}

define amdgpu_kernel void @notsdot2_sext8(<2 x i8> addrspace(1)* %src1,		define amdgpu_kernel void @notsdot2_sext8(<2 x i8> addrspace(1)* %src1,
; GFX7-LABEL: notsdot2_sext8:		; GFX7-LABEL: notsdot2_sext8:
; GFX7: ; %bb.0: ; %entry		; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9		; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
		; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
		foadUnsubmitted Not Done Reply Inline Actions Nice. foad: Nice.
; GFX7-NEXT: s_mov_b32 s3, 0xf000		; GFX7-NEXT: s_mov_b32 s3, 0xf000
; GFX7-NEXT: s_mov_b32 s2, -1		; GFX7-NEXT: s_mov_b32 s2, -1
; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
; GFX7-NEXT: s_mov_b32 s10, s2		; GFX7-NEXT: s_mov_b32 s10, s2
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: s_mov_b32 s8, s6		; GFX7-NEXT: s_mov_b32 s8, s6
; GFX7-NEXT: s_mov_b32 s9, s7		; GFX7-NEXT: s_mov_b32 s9, s7
; GFX7-NEXT: s_mov_b32 s11, s3
; GFX7-NEXT: s_mov_b32 s6, s2		; GFX7-NEXT: s_mov_b32 s6, s2
; GFX7-NEXT: s_mov_b32 s7, s3		; GFX7-NEXT: s_mov_b32 s7, s3
		; GFX7-NEXT: s_mov_b32 s11, s3
; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0
; GFX7-NEXT: buffer_load_ushort v1, off, s[8:11], 0		; GFX7-NEXT: buffer_load_ushort v1, off, s[8:11], 0
; GFX7-NEXT: s_load_dword s4, s[0:1], 0x0		; GFX7-NEXT: s_load_dword s4, s[0:1], 0x0
; GFX7-NEXT: s_waitcnt vmcnt(1)		; GFX7-NEXT: s_waitcnt vmcnt(1)
; GFX7-NEXT: v_bfe_i32 v2, v0, 0, 8		; GFX7-NEXT: v_bfe_i32 v2, v0, 0, 8
; GFX7-NEXT: s_waitcnt vmcnt(0)		; GFX7-NEXT: s_waitcnt vmcnt(0)
; GFX7-NEXT: v_bfe_i32 v3, v1, 0, 8		; GFX7-NEXT: v_bfe_i32 v3, v1, 0, 8
; GFX7-NEXT: v_bfe_i32 v0, v0, 8, 8		; GFX7-NEXT: v_bfe_i32 v0, v0, 8, 8
; GFX7-NEXT: v_bfe_i32 v1, v1, 8, 8		; GFX7-NEXT: v_bfe_i32 v1, v1, 8, 8
; GFX7-NEXT: s_waitcnt lgkmcnt(0)		; GFX7-NEXT: s_waitcnt lgkmcnt(0)
; GFX7-NEXT: v_mad_i32_i24 v0, v1, v0, s4		; GFX7-NEXT: v_mad_i32_i24 v0, v1, v0, s4
; GFX7-NEXT: v_mad_i32_i24 v0, v3, v2, v0		; GFX7-NEXT: v_mad_i32_i24 v0, v3, v2, v0
; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0		; GFX7-NEXT: buffer_store_dword v0, off, s[0:3], 0
; GFX7-NEXT: s_endpgm		; GFX7-NEXT: s_endpgm
;		;
; GFX8-LABEL: notsdot2_sext8:		; GFX8-LABEL: notsdot2_sext8:
; GFX8: ; %bb.0: ; %entry		; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX8-NEXT: s_waitcnt lgkmcnt(0)		; GFX8-NEXT: s_waitcnt lgkmcnt(0)
; GFX8-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX8-NEXT: v_mov_b32_e32 v0, s6		; GFX8-NEXT: v_mov_b32_e32 v0, s6
; GFX8-NEXT: v_mov_b32_e32 v1, s7
; GFX8-NEXT: v_mov_b32_e32 v2, s4		; GFX8-NEXT: v_mov_b32_e32 v2, s4
; GFX8-NEXT: v_mov_b32_e32 v3, s5		; GFX8-NEXT: v_mov_b32_e32 v3, s5
		; GFX8-NEXT: s_load_dword s2, s[0:1], 0x0
		; GFX8-NEXT: v_mov_b32_e32 v1, s7
; GFX8-NEXT: flat_load_ushort v2, v[2:3]		; GFX8-NEXT: flat_load_ushort v2, v[2:3]
; GFX8-NEXT: flat_load_ushort v0, v[0:1]		; GFX8-NEXT: flat_load_ushort v0, v[0:1]
; GFX8-NEXT: s_waitcnt vmcnt(1) lgkmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(1) lgkmcnt(0)
; GFX8-NEXT: v_bfe_i32 v1, v2, 0, 8		; GFX8-NEXT: v_bfe_i32 v1, v2, 0, 8
; GFX8-NEXT: v_lshrrev_b16_e32 v2, 8, v2		; GFX8-NEXT: v_lshrrev_b16_e32 v2, 8, v2
; GFX8-NEXT: s_waitcnt vmcnt(0)		; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: v_bfe_i32 v3, v0, 0, 8		; GFX8-NEXT: v_bfe_i32 v3, v0, 0, 8
; GFX8-NEXT: v_lshrrev_b16_e32 v0, 8, v0		; GFX8-NEXT: v_lshrrev_b16_e32 v0, 8, v0
; GFX8-NEXT: v_bfe_i32 v2, v2, 0, 8		; GFX8-NEXT: v_bfe_i32 v2, v2, 0, 8
; GFX8-NEXT: v_bfe_i32 v0, v0, 0, 8		; GFX8-NEXT: v_bfe_i32 v0, v0, 0, 8
; GFX8-NEXT: v_mad_i32_i24 v0, v0, v2, s2		; GFX8-NEXT: v_mad_i32_i24 v0, v0, v2, s2
; GFX8-NEXT: v_mad_i32_i24 v2, v3, v1, v0		; GFX8-NEXT: v_mad_i32_i24 v2, v3, v1, v0
; GFX8-NEXT: v_mov_b32_e32 v0, s0		; GFX8-NEXT: v_mov_b32_e32 v0, s0
; GFX8-NEXT: v_mov_b32_e32 v1, s1		; GFX8-NEXT: v_mov_b32_e32 v1, s1
; GFX8-NEXT: flat_store_dword v[0:1], v2		; GFX8-NEXT: flat_store_dword v[0:1], v2
; GFX8-NEXT: s_endpgm		; GFX8-NEXT: s_endpgm
;		;
; GFX9-NODL-LABEL: notsdot2_sext8:		; GFX9-NODL-LABEL: notsdot2_sext8:
; GFX9-NODL: ; %bb.0: ; %entry		; GFX9-NODL: ; %bb.0: ; %entry
; GFX9-NODL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NODL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NODL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-NODL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-NODL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NODL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NODL-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX9-NODL-NEXT: v_mov_b32_e32 v0, s6		; GFX9-NODL-NEXT: v_mov_b32_e32 v0, s6
; GFX9-NODL-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NODL-NEXT: v_mov_b32_e32 v2, s4		; GFX9-NODL-NEXT: v_mov_b32_e32 v2, s4
; GFX9-NODL-NEXT: v_mov_b32_e32 v3, s5		; GFX9-NODL-NEXT: v_mov_b32_e32 v3, s5
		; GFX9-NODL-NEXT: s_load_dword s2, s[0:1], 0x0
		; GFX9-NODL-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NODL-NEXT: global_load_ushort v2, v[2:3], off		; GFX9-NODL-NEXT: global_load_ushort v2, v[2:3], off
; GFX9-NODL-NEXT: global_load_ushort v0, v[0:1], off		; GFX9-NODL-NEXT: global_load_ushort v0, v[0:1], off
; GFX9-NODL-NEXT: s_waitcnt vmcnt(1)		; GFX9-NODL-NEXT: s_waitcnt vmcnt(1)
; GFX9-NODL-NEXT: v_bfe_i32 v1, v2, 0, 8		; GFX9-NODL-NEXT: v_bfe_i32 v1, v2, 0, 8
; GFX9-NODL-NEXT: v_lshrrev_b16_e32 v2, 8, v2		; GFX9-NODL-NEXT: v_lshrrev_b16_e32 v2, 8, v2
; GFX9-NODL-NEXT: s_waitcnt vmcnt(0)		; GFX9-NODL-NEXT: s_waitcnt vmcnt(0)
; GFX9-NODL-NEXT: v_bfe_i32 v3, v0, 0, 8		; GFX9-NODL-NEXT: v_bfe_i32 v3, v0, 0, 8
; GFX9-NODL-NEXT: v_lshrrev_b16_e32 v0, 8, v0		; GFX9-NODL-NEXT: v_lshrrev_b16_e32 v0, 8, v0
; GFX9-NODL-NEXT: v_bfe_i32 v2, v2, 0, 8		; GFX9-NODL-NEXT: v_bfe_i32 v2, v2, 0, 8
; GFX9-NODL-NEXT: v_bfe_i32 v0, v0, 0, 8		; GFX9-NODL-NEXT: v_bfe_i32 v0, v0, 0, 8
; GFX9-NODL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NODL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NODL-NEXT: v_mad_i32_i24 v0, v0, v2, s2		; GFX9-NODL-NEXT: v_mad_i32_i24 v0, v0, v2, s2
; GFX9-NODL-NEXT: v_mad_i32_i24 v2, v3, v1, v0		; GFX9-NODL-NEXT: v_mad_i32_i24 v2, v3, v1, v0
; GFX9-NODL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-NODL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-NODL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NODL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NODL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NODL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NODL-NEXT: s_endpgm		; GFX9-NODL-NEXT: s_endpgm
;		;
; GFX9-DL-LABEL: notsdot2_sext8:		; GFX9-DL-LABEL: notsdot2_sext8:
; GFX9-DL: ; %bb.0: ; %entry		; GFX9-DL: ; %bb.0: ; %entry
; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s7
; GFX9-DL-NEXT: v_mov_b32_e32 v2, s4		; GFX9-DL-NEXT: v_mov_b32_e32 v2, s4
; GFX9-DL-NEXT: v_mov_b32_e32 v3, s5		; GFX9-DL-NEXT: v_mov_b32_e32 v3, s5
		; GFX9-DL-NEXT: s_load_dword s2, s[0:1], 0x0
		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s7
; GFX9-DL-NEXT: global_load_ushort v2, v[2:3], off		; GFX9-DL-NEXT: global_load_ushort v2, v[2:3], off
; GFX9-DL-NEXT: global_load_ushort v0, v[0:1], off		; GFX9-DL-NEXT: global_load_ushort v0, v[0:1], off
; GFX9-DL-NEXT: s_waitcnt vmcnt(1)		; GFX9-DL-NEXT: s_waitcnt vmcnt(1)
; GFX9-DL-NEXT: v_bfe_i32 v1, v2, 0, 8		; GFX9-DL-NEXT: v_bfe_i32 v1, v2, 0, 8
; GFX9-DL-NEXT: v_lshrrev_b16_e32 v2, 8, v2		; GFX9-DL-NEXT: v_lshrrev_b16_e32 v2, 8, v2
; GFX9-DL-NEXT: s_waitcnt vmcnt(0)		; GFX9-DL-NEXT: s_waitcnt vmcnt(0)
; GFX9-DL-NEXT: v_bfe_i32 v3, v0, 0, 8		; GFX9-DL-NEXT: v_bfe_i32 v3, v0, 0, 8
; GFX9-DL-NEXT: v_lshrrev_b16_e32 v0, 8, v0		; GFX9-DL-NEXT: v_lshrrev_b16_e32 v0, 8, v0
; GFX9-DL-NEXT: v_bfe_i32 v2, v2, 0, 8		; GFX9-DL-NEXT: v_bfe_i32 v2, v2, 0, 8
; GFX9-DL-NEXT: v_bfe_i32 v0, v0, 0, 8		; GFX9-DL-NEXT: v_bfe_i32 v0, v0, 0, 8
; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-DL-NEXT: v_mad_i32_i24 v0, v0, v2, s2		; GFX9-DL-NEXT: v_mad_i32_i24 v0, v0, v2, s2
; GFX9-DL-NEXT: v_mad_i32_i24 v2, v3, v1, v0		; GFX9-DL-NEXT: v_mad_i32_i24 v2, v3, v1, v0
; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0		; GFX9-DL-NEXT: v_mov_b32_e32 v0, s0
; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1		; GFX9-DL-NEXT: v_mov_b32_e32 v1, s1
; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off		; GFX9-DL-NEXT: global_store_dword v[0:1], v2, off
; GFX9-DL-NEXT: s_endpgm		; GFX9-DL-NEXT: s_endpgm
;		;
; GFX10-DL-LABEL: notsdot2_sext8:		; GFX10-DL-LABEL: notsdot2_sext8:
; GFX10-DL: ; %bb.0: ; %entry		; GFX10-DL: ; %bb.0: ; %entry
; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34		; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
; GFX10-DL-NEXT: ; implicit-def: $vcc_hi		; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX10-DL-NEXT: v_mov_b32_e32 v2, s4		; GFX10-DL-NEXT: v_mov_b32_e32 v2, s4
; GFX10-DL-NEXT: v_mov_b32_e32 v3, s5		; GFX10-DL-NEXT: v_mov_b32_e32 v3, s5
		; GFX10-DL-NEXT: v_mov_b32_e32 v0, s6
; GFX10-DL-NEXT: v_mov_b32_e32 v1, s7		; GFX10-DL-NEXT: v_mov_b32_e32 v1, s7
; GFX10-DL-NEXT: s_load_dword s2, s[0:1], 0x0
; GFX10-DL-NEXT: global_load_ushort v2, v[2:3], off		; GFX10-DL-NEXT: global_load_ushort v2, v[2:3], off
; GFX10-DL-NEXT: global_load_ushort v0, v[0:1], off		; GFX10-DL-NEXT: global_load_ushort v0, v[0:1], off
		; GFX10-DL-NEXT: s_load_dword s2, s[0:1], 0x0
		foadUnsubmitted Not Done Reply Inline Actions Is this just coincidence, or are we actually trying to cluster a FLAT load with an SMEM load? foad: Is this just coincidence, or are we actually trying to cluster a FLAT load with an SMEM load?
		rampitecAuthorUnsubmitted Done Reply Inline Actions No, we do not: BUNDLE implicit-def $vgpr2, implicit-def $vgpr0, implicit killed $vgpr2_vgpr3, implicit $exec, implicit killed $vgpr0_vgpr1 { renamable $vgpr2 = GLOBAL_LOAD_USHORT killed renamable $vgpr2_vgpr3, 0, 0, 0, 0, implicit $exec :: (load 2 from %ir.2, addrspace 1) renamable $vgpr0 = GLOBAL_LOAD_USHORT killed renamable $vgpr0_vgpr1, 0, 0, 0, 0, implicit $exec :: (load 2 from %ir.3, addrspace 1) } renamable $sgpr0 = S_LOAD_DWORD_IMM renamable $sgpr4_sgpr5, 0, 0, 0 :: (load 4 from %ir.4, addrspace 1) That is because of removed mutation again I guess. Anyway, this is a better schedule because global loads take longer than SMRD. rampitec: No, we do not: BUNDLE implicit-def $vgpr2, implicit-def $vgpr0, implicit killed $vgpr2_vgpr3…
; GFX10-DL-NEXT: s_waitcnt vmcnt(1)		; GFX10-DL-NEXT: s_waitcnt vmcnt(1)
; GFX10-DL-NEXT: v_lshrrev_b16_e64 v1, 8, v2		; GFX10-DL-NEXT: v_lshrrev_b16_e64 v1, 8, v2
; GFX10-DL-NEXT: s_waitcnt vmcnt(0)		; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
; GFX10-DL-NEXT: v_lshrrev_b16_e64 v3, 8, v0		; GFX10-DL-NEXT: v_lshrrev_b16_e64 v3, 8, v0
; GFX10-DL-NEXT: v_bfe_i32 v2, v2, 0, 8		; GFX10-DL-NEXT: v_bfe_i32 v2, v2, 0, 8
; GFX10-DL-NEXT: v_bfe_i32 v0, v0, 0, 8		; GFX10-DL-NEXT: v_bfe_i32 v0, v0, 0, 8
; GFX10-DL-NEXT: v_bfe_i32 v1, v1, 0, 8		; GFX10-DL-NEXT: v_bfe_i32 v1, v1, 0, 8
; GFX10-DL-NEXT: v_bfe_i32 v3, v3, 0, 8		; GFX10-DL-NEXT: v_bfe_i32 v3, v3, 0, 8
Show All 31 Lines

llvm/test/CodeGen/AMDGPU/idot4s.ll

	Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
	; GFX7-LABEL: idot4_acc16:			; GFX7-LABEL: idot4_acc16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_mov_b32 s8, 0xffff			; GFX7-NEXT: s_mov_b32 s8, 0xffff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0			; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_sext_i32_i8 s6, s4			; GFX7-NEXT: s_sext_i32_i8 s6, s4
	; GFX7-NEXT: s_bfe_i32 s9, s4, 0x80008
	; GFX7-NEXT: s_sext_i32_i8 s7, s5			; GFX7-NEXT: s_sext_i32_i8 s7, s5
	; GFX7-NEXT: s_bfe_i32 s10, s5, 0x80008			; GFX7-NEXT: s_bfe_i32 s10, s5, 0x80008
	; GFX7-NEXT: s_and_b32 s7, s7, s8			; GFX7-NEXT: s_and_b32 s7, s7, s8
	; GFX7-NEXT: s_bfe_i32 s12, s5, 0x80010			; GFX7-NEXT: s_bfe_i32 s12, s5, 0x80010
				; GFX7-NEXT: s_bfe_i32 s9, s4, 0x80008
	; GFX7-NEXT: s_and_b32 s10, s10, s8			; GFX7-NEXT: s_and_b32 s10, s10, s8
	; GFX7-NEXT: s_and_b32 s6, s6, s8			; GFX7-NEXT: s_and_b32 s6, s6, s8
	; GFX7-NEXT: v_mov_b32_e32 v1, s7			; GFX7-NEXT: v_mov_b32_e32 v1, s7
	; GFX7-NEXT: s_bfe_i32 s11, s4, 0x80010			; GFX7-NEXT: s_bfe_i32 s11, s4, 0x80010
	; GFX7-NEXT: s_ashr_i32 s5, s5, 24			; GFX7-NEXT: s_ashr_i32 s5, s5, 24
	; GFX7-NEXT: s_and_b32 s12, s12, s8			; GFX7-NEXT: s_and_b32 s12, s12, s8
	; GFX7-NEXT: s_and_b32 s9, s9, s8			; GFX7-NEXT: s_and_b32 s9, s9, s8
	; GFX7-NEXT: v_mov_b32_e32 v2, s10			; GFX7-NEXT: v_mov_b32_e32 v2, s10
	▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @idot4_acc8(<4 x i8> addrspace(1)* %src1,			define amdgpu_kernel void @idot4_acc8(<4 x i8> addrspace(1)* %src1,
	; GFX7-LABEL: idot4_acc8:			; GFX7-LABEL: idot4_acc8:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
	; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_movk_i32 s5, 0xff			; GFX7-NEXT: s_movk_i32 s5, 0xff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_u32 s9, s4, 0x80008
	; GFX7-NEXT: s_bfe_u32 s11, s4, 0x80010
	; GFX7-NEXT: s_and_b32 s7, s6, s5			; GFX7-NEXT: s_and_b32 s7, s6, s5
	; GFX7-NEXT: s_and_b32 s5, s4, s5
	; GFX7-NEXT: s_bfe_u32 s8, s6, 0x80008			; GFX7-NEXT: s_bfe_u32 s8, s6, 0x80008
				; GFX7-NEXT: s_and_b32 s5, s4, s5
	; GFX7-NEXT: v_mov_b32_e32 v1, s7			; GFX7-NEXT: v_mov_b32_e32 v1, s7
	; GFX7-NEXT: s_bfe_u32 s10, s6, 0x80010			; GFX7-NEXT: s_bfe_u32 s10, s6, 0x80010
				; GFX7-NEXT: s_bfe_u32 s9, s4, 0x80008
	; GFX7-NEXT: v_mov_b32_e32 v2, s8			; GFX7-NEXT: v_mov_b32_e32 v2, s8
				; GFX7-NEXT: s_bfe_u32 s11, s4, 0x80010
	; GFX7-NEXT: s_lshr_b32 s6, s6, 24			; GFX7-NEXT: s_lshr_b32 s6, s6, 24
	; GFX7-NEXT: v_mov_b32_e32 v3, s10			; GFX7-NEXT: v_mov_b32_e32 v3, s10
	; GFX7-NEXT: s_lshr_b32 s4, s4, 24			; GFX7-NEXT: s_lshr_b32 s4, s4, 24
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0
	; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0
	; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
	; GFX7-NEXT: v_mov_b32_e32 v1, s6			; GFX7-NEXT: v_mov_b32_e32 v1, s6
	▲ Show 20 Lines • Show All 513 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @idot4_acc16_vecMul(<4 x i8> addrspace(1)* %src1,			define amdgpu_kernel void @idot4_acc16_vecMul(<4 x i8> addrspace(1)* %src1,
	; GFX7-LABEL: idot4_acc16_vecMul:			; GFX7-LABEL: idot4_acc16_vecMul:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0			; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_ashr_i32 s6, s4, 24			; GFX7-NEXT: s_ashr_i32 s6, s4, 24
	; GFX7-NEXT: s_bfe_i32 s7, s4, 0x80010
	; GFX7-NEXT: s_bfe_i32 s10, s5, 0x80010			; GFX7-NEXT: s_bfe_i32 s10, s5, 0x80010
	; GFX7-NEXT: s_bfe_i32 s11, s5, 0x80008			; GFX7-NEXT: s_bfe_i32 s11, s5, 0x80008
	; GFX7-NEXT: s_ashr_i32 s9, s5, 24			; GFX7-NEXT: s_ashr_i32 s9, s5, 24
	; GFX7-NEXT: s_sext_i32_i8 s5, s5			; GFX7-NEXT: s_sext_i32_i8 s5, s5
				; GFX7-NEXT: s_bfe_i32 s7, s4, 0x80010
	; GFX7-NEXT: s_bfe_i32 s8, s4, 0x80008			; GFX7-NEXT: s_bfe_i32 s8, s4, 0x80008
	; GFX7-NEXT: s_sext_i32_i8 s4, s4			; GFX7-NEXT: s_sext_i32_i8 s4, s4
	; GFX7-NEXT: v_mov_b32_e32 v1, s5			; GFX7-NEXT: v_mov_b32_e32 v1, s5
	; GFX7-NEXT: v_mov_b32_e32 v2, s11			; GFX7-NEXT: v_mov_b32_e32 v2, s11
	; GFX7-NEXT: v_mov_b32_e32 v3, s10			; GFX7-NEXT: v_mov_b32_e32 v3, s10
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_mad_i32_i24 v0, s4, v1, v0			; GFX7-NEXT: v_mad_i32_i24 v0, s4, v1, v0
	; GFX7-NEXT: v_mad_i32_i24 v0, s8, v2, v0			; GFX7-NEXT: v_mad_i32_i24 v0, s8, v2, v0
	▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: v_mov_b32_e32 v2, 0xffff			; GFX10-DL-NEXT: v_mov_b32_e32 v2, 0xffff
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ushort v3, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ushort v3, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x80000			; GFX10-DL-NEXT: s_bfe_i32 s4, s0, 0x80000
	; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x80000			; GFX10-DL-NEXT: s_bfe_i32 s3, s1, 0x80000
	; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 16			; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 16
	; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 8, s0			; GFX10-DL-NEXT: v_ashrrev_i16_e64 v4, 8, s0
	; GFX10-DL-NEXT: s_lshr_b32 s5, s1, 16			; GFX10-DL-NEXT: s_lshr_b32 s5, s1, 16
	; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 8, s1			; GFX10-DL-NEXT: v_ashrrev_i16_e64 v5, 8, s1
	; GFX10-DL-NEXT: v_and_b32_e32 v6, s3, v2			; GFX10-DL-NEXT: v_and_b32_e32 v6, s3, v2
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot4u.ll

	Show First 20 Lines • Show All 178 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot4_acc16(<4 x i8> addrspace(1)* %src1,			define amdgpu_kernel void @udot4_acc16(<4 x i8> addrspace(1)* %src1,
	; GFX7-LABEL: udot4_acc16:			; GFX7-LABEL: udot4_acc16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
	; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX7-NEXT: buffer_load_ushort v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_movk_i32 s5, 0xff			; GFX7-NEXT: s_movk_i32 s5, 0xff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_u32 s9, s4, 0x80008
	; GFX7-NEXT: s_bfe_u32 s11, s4, 0x80010
	; GFX7-NEXT: s_and_b32 s7, s6, s5			; GFX7-NEXT: s_and_b32 s7, s6, s5
	; GFX7-NEXT: s_and_b32 s5, s4, s5
	; GFX7-NEXT: s_bfe_u32 s8, s6, 0x80008			; GFX7-NEXT: s_bfe_u32 s8, s6, 0x80008
				; GFX7-NEXT: s_and_b32 s5, s4, s5
	; GFX7-NEXT: v_mov_b32_e32 v1, s7			; GFX7-NEXT: v_mov_b32_e32 v1, s7
	; GFX7-NEXT: s_bfe_u32 s10, s6, 0x80010			; GFX7-NEXT: s_bfe_u32 s10, s6, 0x80010
				; GFX7-NEXT: s_bfe_u32 s9, s4, 0x80008
	; GFX7-NEXT: v_mov_b32_e32 v2, s8			; GFX7-NEXT: v_mov_b32_e32 v2, s8
				; GFX7-NEXT: s_bfe_u32 s11, s4, 0x80010
	; GFX7-NEXT: s_lshr_b32 s6, s6, 24			; GFX7-NEXT: s_lshr_b32 s6, s6, 24
	; GFX7-NEXT: v_mov_b32_e32 v3, s10			; GFX7-NEXT: v_mov_b32_e32 v3, s10
	; GFX7-NEXT: s_lshr_b32 s4, s4, 24			; GFX7-NEXT: s_lshr_b32 s4, s4, 24
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0
	; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0
	; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
	; GFX7-NEXT: v_mov_b32_e32 v1, s6			; GFX7-NEXT: v_mov_b32_e32 v1, s6
	▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot4_acc8(<4 x i8> addrspace(1)* %src1,			define amdgpu_kernel void @udot4_acc8(<4 x i8> addrspace(1)* %src1,
	; GFX7-LABEL: udot4_acc8:			; GFX7-LABEL: udot4_acc8:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
	; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_movk_i32 s5, 0xff			; GFX7-NEXT: s_movk_i32 s5, 0xff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_u32 s9, s4, 0x80008
	; GFX7-NEXT: s_bfe_u32 s11, s4, 0x80010
	; GFX7-NEXT: s_and_b32 s7, s6, s5			; GFX7-NEXT: s_and_b32 s7, s6, s5
	; GFX7-NEXT: s_and_b32 s5, s4, s5
	; GFX7-NEXT: s_bfe_u32 s8, s6, 0x80008			; GFX7-NEXT: s_bfe_u32 s8, s6, 0x80008
				; GFX7-NEXT: s_and_b32 s5, s4, s5
	; GFX7-NEXT: v_mov_b32_e32 v1, s7			; GFX7-NEXT: v_mov_b32_e32 v1, s7
	; GFX7-NEXT: s_bfe_u32 s10, s6, 0x80010			; GFX7-NEXT: s_bfe_u32 s10, s6, 0x80010
				; GFX7-NEXT: s_bfe_u32 s9, s4, 0x80008
	; GFX7-NEXT: v_mov_b32_e32 v2, s8			; GFX7-NEXT: v_mov_b32_e32 v2, s8
				; GFX7-NEXT: s_bfe_u32 s11, s4, 0x80010
	; GFX7-NEXT: s_lshr_b32 s6, s6, 24			; GFX7-NEXT: s_lshr_b32 s6, s6, 24
	; GFX7-NEXT: v_mov_b32_e32 v3, s10			; GFX7-NEXT: v_mov_b32_e32 v3, s10
	; GFX7-NEXT: s_lshr_b32 s4, s4, 24			; GFX7-NEXT: s_lshr_b32 s4, s4, 24
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s5, v1, v0
	; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s9, v2, v0
	; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s11, v3, v0
	; GFX7-NEXT: v_mov_b32_e32 v1, s6			; GFX7-NEXT: v_mov_b32_e32 v1, s6
	▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	; GFX7-LABEL: udot2_8:			; GFX7-LABEL: udot2_8:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_movk_i32 s8, 0xff			; GFX7-NEXT: s_movk_i32 s8, 0xff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_and_b32 s7, s4, s8			; GFX7-NEXT: s_and_b32 s7, s4, s8
	; GFX7-NEXT: s_bfe_u32 s4, s4, 0x80008
	; GFX7-NEXT: s_and_b32 s6, s5, s8			; GFX7-NEXT: s_and_b32 s6, s5, s8
	; GFX7-NEXT: v_mov_b32_e32 v1, s6			; GFX7-NEXT: v_mov_b32_e32 v1, s6
	; GFX7-NEXT: s_bfe_u32 s5, s5, 0x80008			; GFX7-NEXT: s_bfe_u32 s5, s5, 0x80008
				; GFX7-NEXT: s_bfe_u32 s4, s4, 0x80008
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_mad_u32_u24 v0, s7, v1, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s7, v1, v0
	; GFX7-NEXT: v_mov_b32_e32 v1, s5			; GFX7-NEXT: v_mov_b32_e32 v1, s5
	; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s4, v1, v0
	; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0			; GFX7-NEXT: buffer_store_byte v0, off, s[0:3], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: udot2_8:			; GFX8-LABEL: udot2_8:
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: s_movk_i32 s2, 0xff			; GFX10-DL-NEXT: s_movk_i32 s2, 0xff
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[6:7], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[4:5], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_and_b32 s3, s0, s2			; GFX10-DL-NEXT: s_and_b32 s3, s0, s2
	; GFX10-DL-NEXT: s_and_b32 s2, s1, s2			; GFX10-DL-NEXT: s_and_b32 s2, s1, s2
	; GFX10-DL-NEXT: s_bfe_u32 s0, s0, 0x80008			; GFX10-DL-NEXT: s_bfe_u32 s0, s0, 0x80008
	; GFX10-DL-NEXT: s_bfe_u32 s1, s1, 0x80008			; GFX10-DL-NEXT: s_bfe_u32 s1, s1, 0x80008
	; GFX10-DL-NEXT: s_waitcnt vmcnt(0)			; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2			; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
	; GFX10-DL-NEXT: v_mad_u32_u24 v2, s1, s0, v2			; GFX10-DL-NEXT: v_mad_u32_u24 v2, s1, s0, v2
	Show All 24 Lines
	; GFX7-LABEL: udot4_CommutationInsideMAD:			; GFX7-LABEL: udot4_CommutationInsideMAD:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_movk_i32 s8, 0xff			; GFX7-NEXT: s_movk_i32 s8, 0xff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_and_b32 s6, s4, s8			; GFX7-NEXT: s_and_b32 s6, s4, s8
	; GFX7-NEXT: v_mov_b32_e32 v1, s6
	; GFX7-NEXT: s_and_b32 s7, s5, s8			; GFX7-NEXT: s_and_b32 s7, s5, s8
	; GFX7-NEXT: s_bfe_u32 s8, s4, 0x80008			; GFX7-NEXT: s_bfe_u32 s8, s4, 0x80008
				; GFX7-NEXT: v_mov_b32_e32 v1, s6
	; GFX7-NEXT: s_bfe_u32 s10, s4, 0x80010			; GFX7-NEXT: s_bfe_u32 s10, s4, 0x80010
	; GFX7-NEXT: s_bfe_u32 s9, s5, 0x80008			; GFX7-NEXT: s_bfe_u32 s9, s5, 0x80008
	; GFX7-NEXT: v_mov_b32_e32 v2, s8			; GFX7-NEXT: v_mov_b32_e32 v2, s8
	; GFX7-NEXT: s_bfe_u32 s11, s5, 0x80010			; GFX7-NEXT: s_bfe_u32 s11, s5, 0x80010
	; GFX7-NEXT: s_lshr_b32 s4, s4, 24			; GFX7-NEXT: s_lshr_b32 s4, s4, 24
	; GFX7-NEXT: v_mov_b32_e32 v3, s10			; GFX7-NEXT: v_mov_b32_e32 v3, s10
	; GFX7-NEXT: s_lshr_b32 s5, s5, 24			; GFX7-NEXT: s_lshr_b32 s5, s5, 24
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
	; GFX7-LABEL: udot4_CommutationAccrossMADs:			; GFX7-LABEL: udot4_CommutationAccrossMADs:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_movk_i32 s8, 0xff			; GFX7-NEXT: s_movk_i32 s8, 0xff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_and_b32 s6, s4, s8			; GFX7-NEXT: s_and_b32 s6, s4, s8
	; GFX7-NEXT: s_bfe_u32 s10, s4, 0x80010
	; GFX7-NEXT: s_and_b32 s7, s5, s8			; GFX7-NEXT: s_and_b32 s7, s5, s8
	; GFX7-NEXT: s_bfe_u32 s8, s4, 0x80008			; GFX7-NEXT: s_bfe_u32 s8, s4, 0x80008
	; GFX7-NEXT: s_bfe_u32 s9, s5, 0x80008			; GFX7-NEXT: s_bfe_u32 s9, s5, 0x80008
	; GFX7-NEXT: v_mov_b32_e32 v1, s8			; GFX7-NEXT: v_mov_b32_e32 v1, s8
				; GFX7-NEXT: s_bfe_u32 s10, s4, 0x80010
	; GFX7-NEXT: v_mov_b32_e32 v2, s6			; GFX7-NEXT: v_mov_b32_e32 v2, s6
	; GFX7-NEXT: s_bfe_u32 s11, s5, 0x80010			; GFX7-NEXT: s_bfe_u32 s11, s5, 0x80010
	; GFX7-NEXT: s_lshr_b32 s4, s4, 24			; GFX7-NEXT: s_lshr_b32 s4, s4, 24
	; GFX7-NEXT: v_mov_b32_e32 v3, s10			; GFX7-NEXT: v_mov_b32_e32 v3, s10
	; GFX7-NEXT: s_lshr_b32 s5, s5, 24			; GFX7-NEXT: s_lshr_b32 s5, s5, 24
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s9, v1, v0
	; GFX7-NEXT: v_mad_u32_u24 v0, s7, v2, v0			; GFX7-NEXT: v_mad_u32_u24 v0, s7, v2, v0
	▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: s_movk_i32 s2, 0xff			; GFX10-DL-NEXT: s_movk_i32 s2, 0xff
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x80008			; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x80008
	; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x80008			; GFX10-DL-NEXT: s_bfe_u32 s4, s1, 0x80008
	; GFX10-DL-NEXT: s_and_b32 s5, s0, s2			; GFX10-DL-NEXT: s_and_b32 s5, s0, s2
	; GFX10-DL-NEXT: s_and_b32 s2, s1, s2			; GFX10-DL-NEXT: s_and_b32 s2, s1, s2
	; GFX10-DL-NEXT: s_waitcnt vmcnt(0)			; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s3, v2			; GFX10-DL-NEXT: v_mad_u32_u24 v2, s4, s3, v2
	; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x80010			; GFX10-DL-NEXT: s_bfe_u32 s3, s0, 0x80010
	▲ Show 20 Lines • Show All 579 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: notdot4_mixedtypes:			; GFX10-DL-LABEL: notdot4_mixedtypes:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x80008			; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x80008
	; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x80008			; GFX10-DL-NEXT: s_bfe_u32 s3, s1, 0x80008
	; GFX10-DL-NEXT: s_sext_i32_i8 s4, s0			; GFX10-DL-NEXT: s_sext_i32_i8 s4, s0
	; GFX10-DL-NEXT: s_sext_i32_i8 s5, s1			; GFX10-DL-NEXT: s_sext_i32_i8 s5, s1
	; GFX10-DL-NEXT: s_waitcnt vmcnt(0)			; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2			; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s3, v2
	; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x80010			; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x80010
	▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: v_mov_b32_e32 v2, 0xffff			; GFX10-DL-NEXT: v_mov_b32_e32 v2, 0xffff
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ushort v3, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ushort v3, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_lshrrev_b16_e64 v4, 8, s0			; GFX10-DL-NEXT: v_lshrrev_b16_e64 v4, 8, s0
	; GFX10-DL-NEXT: v_and_b32_sdwa v7, v2, s0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0			; GFX10-DL-NEXT: v_and_b32_sdwa v7, v2, s0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
	; GFX10-DL-NEXT: v_lshrrev_b16_e64 v5, 8, s1			; GFX10-DL-NEXT: v_lshrrev_b16_e64 v5, 8, s1
	; GFX10-DL-NEXT: v_and_b32_sdwa v6, v2, s1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0			; GFX10-DL-NEXT: v_and_b32_sdwa v6, v2, s1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
	; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 16			; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 16
	; GFX10-DL-NEXT: s_lshr_b32 s3, s1, 16			; GFX10-DL-NEXT: s_lshr_b32 s3, s1, 16
	; GFX10-DL-NEXT: v_lshl_or_b32 v4, v4, 16, v7			; GFX10-DL-NEXT: v_lshl_or_b32 v4, v4, 16, v7
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX7-LABEL: udot4_acc8_vecMul:			; GFX7-LABEL: udot4_acc8_vecMul:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s3, 0xf000			; GFX7-NEXT: s_mov_b32 s3, 0xf000
	; GFX7-NEXT: s_mov_b32 s2, -1			; GFX7-NEXT: s_mov_b32 s2, -1
	; GFX7-NEXT: s_movk_i32 s8, 0xff			; GFX7-NEXT: s_movk_i32 s8, 0xff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[0:3], 0
				; GFX7-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0			; GFX7-NEXT: s_load_dword s5, s[6:7], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_u32 s6, s4, 0x80008			; GFX7-NEXT: s_bfe_u32 s6, s4, 0x80008
	; GFX7-NEXT: s_lshr_b32 s7, s4, 16
	; GFX7-NEXT: s_bfe_u32 s10, s5, 0x80008			; GFX7-NEXT: s_bfe_u32 s10, s5, 0x80008
	; GFX7-NEXT: s_lshr_b32 s11, s5, 16			; GFX7-NEXT: s_lshr_b32 s11, s5, 16
	; GFX7-NEXT: s_lshr_b32 s12, s5, 24			; GFX7-NEXT: s_lshr_b32 s12, s5, 24
	; GFX7-NEXT: v_mov_b32_e32 v2, s11
	; GFX7-NEXT: v_mov_b32_e32 v3, s10			; GFX7-NEXT: v_mov_b32_e32 v3, s10
				; GFX7-NEXT: s_lshr_b32 s7, s4, 16
				; GFX7-NEXT: v_mov_b32_e32 v2, s11
	; GFX7-NEXT: s_lshr_b32 s9, s4, 24			; GFX7-NEXT: s_lshr_b32 s9, s4, 24
	; GFX7-NEXT: v_mov_b32_e32 v1, s12			; GFX7-NEXT: v_mov_b32_e32 v1, s12
	; GFX7-NEXT: s_mul_i32 s4, s4, s5			; GFX7-NEXT: s_mul_i32 s4, s4, s5
	; GFX7-NEXT: v_mul_u32_u24_e32 v1, s9, v1			; GFX7-NEXT: v_mul_u32_u24_e32 v1, s9, v1
	; GFX7-NEXT: v_mul_u32_u24_e32 v2, s7, v2			; GFX7-NEXT: v_mul_u32_u24_e32 v2, s7, v2
	; GFX7-NEXT: v_mul_u32_u24_e32 v3, s6, v3			; GFX7-NEXT: v_mul_u32_u24_e32 v3, s6, v3
	; GFX7-NEXT: s_and_b32 s5, s4, s8			; GFX7-NEXT: s_and_b32 s5, s4, s8
	; GFX7-NEXT: v_lshlrev_b32_e32 v1, 8, v1			; GFX7-NEXT: v_lshlrev_b32_e32 v1, 8, v1
	▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: udot4_acc8_vecMul:			; GFX10-DL-LABEL: udot4_acc8_vecMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_lshrrev_b16_e64 v3, 8, s0			; GFX10-DL-NEXT: v_lshrrev_b16_e64 v3, 8, s0
	; GFX10-DL-NEXT: v_lshrrev_b16_e64 v4, 8, s1			; GFX10-DL-NEXT: v_lshrrev_b16_e64 v4, 8, s1
	; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 24			; GFX10-DL-NEXT: s_lshr_b32 s2, s0, 24
	; GFX10-DL-NEXT: s_lshr_b32 s3, s1, 24			; GFX10-DL-NEXT: s_lshr_b32 s3, s1, 24
	; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 16			; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 16
	; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, v3, v4			; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, v3, v4
	; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, s0, s1			; GFX10-DL-NEXT: v_mul_lo_u16_e64 v4, s0, s1
	Show All 38 Lines

llvm/test/CodeGen/AMDGPU/idot8s.ll

	Show First 20 Lines • Show All 254 Lines • ▼ Show 20 Lines
	; GFX7-LABEL: idot8_acc16:			; GFX7-LABEL: idot8_acc16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_mov_b32 s0, 0xffff			; GFX7-NEXT: s_mov_b32 s0, 0xffff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s2, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s2, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000			; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000
	; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004
	; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000			; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000
	; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40004			; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40004
	; GFX7-NEXT: s_and_b32 s9, s9, s0			; GFX7-NEXT: s_and_b32 s9, s9, s0
				; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004
	; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40008			; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40008
	; GFX7-NEXT: s_and_b32 s11, s11, s0			; GFX7-NEXT: s_and_b32 s11, s11, s0
	; GFX7-NEXT: s_and_b32 s8, s8, s0			; GFX7-NEXT: s_and_b32 s8, s8, s0
	; GFX7-NEXT: v_mov_b32_e32 v1, s9			; GFX7-NEXT: v_mov_b32_e32 v1, s9
	; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008			; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008
	; GFX7-NEXT: s_bfe_i32 s15, s2, 0x4000c			; GFX7-NEXT: s_bfe_i32 s15, s2, 0x4000c
	; GFX7-NEXT: s_and_b32 s13, s13, s0			; GFX7-NEXT: s_and_b32 s13, s13, s0
	; GFX7-NEXT: s_and_b32 s10, s10, s0			; GFX7-NEXT: s_and_b32 s10, s10, s0
	▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 12			; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 12
	; GFX10-DL-NEXT: s_lshr_b32 s5, s1, 12			; GFX10-DL-NEXT: s_lshr_b32 s5, s1, 12
	; GFX10-DL-NEXT: s_bfe_i32 s6, s0, 0x40000			; GFX10-DL-NEXT: s_bfe_i32 s6, s0, 0x40000
	; GFX10-DL-NEXT: s_bfe_i32 s7, s1, 0x40000			; GFX10-DL-NEXT: s_bfe_i32 s7, s1, 0x40000
	; GFX10-DL-NEXT: s_bfe_i32 s8, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_i32 s8, s0, 0x40004
	; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s4			; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s4
	; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s5			; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s5
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; GFX7-LABEL: idot8_acc8:			; GFX7-LABEL: idot8_acc8:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_movk_i32 s0, 0xff			; GFX7-NEXT: s_movk_i32 s0, 0xff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s1, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s2, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s2, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000			; GFX7-NEXT: s_bfe_i32 s8, s1, 0x40000
	; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004
	; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000			; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000
	; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40004			; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40004
	; GFX7-NEXT: s_and_b32 s9, s9, s0			; GFX7-NEXT: s_and_b32 s9, s9, s0
				; GFX7-NEXT: s_bfe_i32 s10, s1, 0x40004
	; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40008			; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40008
	; GFX7-NEXT: s_and_b32 s11, s11, s0			; GFX7-NEXT: s_and_b32 s11, s11, s0
	; GFX7-NEXT: s_and_b32 s8, s8, s0			; GFX7-NEXT: s_and_b32 s8, s8, s0
	; GFX7-NEXT: v_mov_b32_e32 v1, s9			; GFX7-NEXT: v_mov_b32_e32 v1, s9
	; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008			; GFX7-NEXT: s_bfe_i32 s12, s1, 0x40008
	; GFX7-NEXT: s_bfe_i32 s15, s2, 0x4000c			; GFX7-NEXT: s_bfe_i32 s15, s2, 0x4000c
	; GFX7-NEXT: s_and_b32 s13, s13, s0			; GFX7-NEXT: s_and_b32 s13, s13, s0
	; GFX7-NEXT: s_and_b32 s10, s10, s0			; GFX7-NEXT: s_and_b32 s10, s10, s0
	▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: s_movk_i32 s2, 0xff			; GFX10-DL-NEXT: s_movk_i32 s2, 0xff
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 12			; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 12
	; GFX10-DL-NEXT: s_lshr_b32 s5, s1, 12			; GFX10-DL-NEXT: s_lshr_b32 s5, s1, 12
	; GFX10-DL-NEXT: s_bfe_i32 s6, s0, 0x40000			; GFX10-DL-NEXT: s_bfe_i32 s6, s0, 0x40000
	; GFX10-DL-NEXT: s_bfe_i32 s7, s1, 0x40000			; GFX10-DL-NEXT: s_bfe_i32 s7, s1, 0x40000
	; GFX10-DL-NEXT: s_bfe_i32 s8, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_i32 s8, s0, 0x40004
	; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s4			; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s4
	; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s5			; GFX10-DL-NEXT: v_lshlrev_b16_e64 v4, 12, s5
	▲ Show 20 Lines • Show All 1,004 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: idot8_acc16_vecMul:			; GFX10-DL-LABEL: idot8_acc16_vecMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_and_b32 s5, s0, 15			; GFX10-DL-NEXT: s_and_b32 s5, s0, 15
	; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40004
	; GFX10-DL-NEXT: s_and_b32 s7, s1, 15			; GFX10-DL-NEXT: s_and_b32 s7, s1, 15
	; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018			; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40018
	; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 28			; GFX10-DL-NEXT: s_lshr_b32 s4, s0, 28
	; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s5, s6			; GFX10-DL-NEXT: s_pack_ll_b32_b16 s5, s5, s6
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @idot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @idot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: idot8_acc8_vecMul:			; GFX7-LABEL: idot8_acc8_vecMul:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_movk_i32 s0, 0xff			; GFX7-NEXT: s_movk_i32 s0, 0xff
				; GFX7-NEXT: s_mov_b32 s1, 0xffff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s2, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s2, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s8, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s8, s[10:11], 0x0
	; GFX7-NEXT: s_mov_b32 s1, 0xffff
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000			; GFX7-NEXT: s_bfe_i32 s9, s2, 0x40000
	; GFX7-NEXT: s_bfe_i32 s10, s2, 0x40004
	; GFX7-NEXT: s_bfe_i32 s16, s8, 0x40000			; GFX7-NEXT: s_bfe_i32 s16, s8, 0x40000
	; GFX7-NEXT: s_bfe_i32 s17, s8, 0x40004			; GFX7-NEXT: s_bfe_i32 s17, s8, 0x40004
	; GFX7-NEXT: s_bfe_i32 s18, s8, 0x40008			; GFX7-NEXT: s_bfe_i32 s18, s8, 0x40008
	; GFX7-NEXT: s_bfe_i32 s19, s8, 0x4000c			; GFX7-NEXT: s_bfe_i32 s19, s8, 0x4000c
	; GFX7-NEXT: s_bfe_i32 s20, s8, 0x40010			; GFX7-NEXT: s_bfe_i32 s20, s8, 0x40010
	; GFX7-NEXT: s_bfe_i32 s21, s8, 0x40014			; GFX7-NEXT: s_bfe_i32 s21, s8, 0x40014
	; GFX7-NEXT: s_bfe_i32 s22, s8, 0x40018			; GFX7-NEXT: s_bfe_i32 s22, s8, 0x40018
	; GFX7-NEXT: s_ashr_i32 s8, s8, 28			; GFX7-NEXT: s_ashr_i32 s8, s8, 28
	; GFX7-NEXT: v_mov_b32_e32 v7, s17
	; GFX7-NEXT: v_mov_b32_e32 v8, s16			; GFX7-NEXT: v_mov_b32_e32 v8, s16
				; GFX7-NEXT: s_bfe_i32 s10, s2, 0x40004
				; GFX7-NEXT: v_mov_b32_e32 v7, s17
	; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40008			; GFX7-NEXT: s_bfe_i32 s11, s2, 0x40008
	; GFX7-NEXT: v_mov_b32_e32 v6, s18			; GFX7-NEXT: v_mov_b32_e32 v6, s18
	; GFX7-NEXT: s_bfe_i32 s12, s2, 0x4000c			; GFX7-NEXT: s_bfe_i32 s12, s2, 0x4000c
	; GFX7-NEXT: v_mov_b32_e32 v5, s19			; GFX7-NEXT: v_mov_b32_e32 v5, s19
	; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40010			; GFX7-NEXT: s_bfe_i32 s13, s2, 0x40010
	; GFX7-NEXT: v_mov_b32_e32 v4, s20			; GFX7-NEXT: v_mov_b32_e32 v4, s20
	; GFX7-NEXT: s_bfe_i32 s14, s2, 0x40014			; GFX7-NEXT: s_bfe_i32 s14, s2, 0x40014
	; GFX7-NEXT: v_mov_b32_e32 v3, s21			; GFX7-NEXT: v_mov_b32_e32 v3, s21
	▲ Show 20 Lines • Show All 313 Lines • ▼ Show 20 Lines
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_lshr_b32 s8, s0, 4			; GFX10-DL-NEXT: s_lshr_b32 s8, s0, 4
	; GFX10-DL-NEXT: s_lshr_b32 s15, s1, 4			; GFX10-DL-NEXT: s_lshr_b32 s15, s1, 4
	; GFX10-DL-NEXT: s_lshr_b32 s9, s0, 12			; GFX10-DL-NEXT: s_lshr_b32 s9, s0, 12
	; GFX10-DL-NEXT: s_lshr_b32 s16, s1, 12			; GFX10-DL-NEXT: s_lshr_b32 s16, s1, 12
	; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0			; GFX10-DL-NEXT: v_lshlrev_b16_e64 v3, 12, s0
	; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s8			; GFX10-DL-NEXT: v_lshlrev_b16_e64 v7, 12, s8
	; GFX10-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s15			; GFX10-DL-NEXT: v_lshlrev_b16_e64 v12, 12, s15
	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/idot8u.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s			; RUN: llc -mtriple=amdgcn -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX7 %s
	; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s			; RUN: llc -mtriple=amdgcn -mcpu=gfx803 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX8 %s
	; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s			; RUN: llc -mtriple=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9-DL %s
	; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s			; RUN: llc -mtriple=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s
	; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s			; RUN: llc -mtriple=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10-DL %s

	define amdgpu_kernel void @udot8_acc32(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_acc32(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_acc32:			; GFX7-LABEL: udot8_acc32:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0			; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_lshr_b32 s1, s0, 28
	; GFX7-NEXT: s_lshr_b32 s11, s10, 28			; GFX7-NEXT: s_lshr_b32 s11, s10, 28
	; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010
	; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c			; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008			; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008
	; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004			; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004
	; GFX7-NEXT: s_and_b32 s10, s10, 15			; GFX7-NEXT: s_and_b32 s10, s10, 15
				; GFX7-NEXT: s_lshr_b32 s1, s0, 28
	; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018			; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010
	; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008
	; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004
	; GFX7-NEXT: s_and_b32 s0, s0, 15			; GFX7-NEXT: s_and_b32 s0, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v0, s10			; GFX7-NEXT: v_mov_b32_e32 v0, s10
	Show All 16 Lines
	; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: udot8_acc32:			; GFX8-LABEL: udot8_acc32:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0			; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_lshr_b32 s4, s2, 28
	; GFX8-NEXT: s_lshr_b32 s7, s6, 28			; GFX8-NEXT: s_lshr_b32 s7, s6, 28
	; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018			; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018
	; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014			; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014
	; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010			; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010
	; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c			; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c
	; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008			; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008
	; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004			; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004
	; GFX8-NEXT: s_and_b32 s6, s6, 15			; GFX8-NEXT: s_and_b32 s6, s6, 15
				; GFX8-NEXT: s_lshr_b32 s4, s2, 28
	; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018			; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018
	; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014			; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014
	; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010			; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010
	; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c			; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c
	; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008			; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008
	; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004			; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004
	; GFX8-NEXT: s_and_b32 s2, s2, 15			; GFX8-NEXT: s_and_b32 s2, s2, 15
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	Show All 18 Lines
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: udot8_acc32:			; GFX9-LABEL: udot8_acc32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0			; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshr_b32 s4, s2, 28
	; GFX9-NEXT: s_lshr_b32 s7, s6, 28			; GFX9-NEXT: s_lshr_b32 s7, s6, 28
	; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018			; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018
	; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014			; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014
	; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010			; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010
	; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c			; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c
	; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008			; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008
	; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004			; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004
	; GFX9-NEXT: s_and_b32 s6, s6, 15			; GFX9-NEXT: s_and_b32 s6, s6, 15
				; GFX9-NEXT: s_lshr_b32 s4, s2, 28
	; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018			; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018
	; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014			; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014
	; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010			; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010
	; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c			; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c
	; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008			; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008
	; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004			; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004
	; GFX9-NEXT: s_and_b32 s2, s2, 15			; GFX9-NEXT: s_and_b32 s2, s2, 15
	; GFX9-NEXT: v_mov_b32_e32 v0, s6			; GFX9-NEXT: v_mov_b32_e32 v0, s6
	▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_acc16(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_acc16(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_acc16:			; GFX7-LABEL: udot8_acc16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_lshr_b32 s2, s0, 28			; GFX7-NEXT: s_lshr_b32 s2, s0, 28
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010
	; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c			; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008			; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008
	; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004			; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004
	; GFX7-NEXT: s_lshr_b32 s14, s1, 28			; GFX7-NEXT: s_lshr_b32 s14, s1, 28
	; GFX7-NEXT: s_and_b32 s1, s1, 15			; GFX7-NEXT: s_and_b32 s1, s1, 15
				; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010
	; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004
	; GFX7-NEXT: s_and_b32 s0, s0, 15			; GFX7-NEXT: s_and_b32 s0, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s20			; GFX7-NEXT: v_mov_b32_e32 v2, s20
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: udot8_acc16:			; GFX10-DL-LABEL: udot8_acc16:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_and_b32 s2, s0, 15			; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
	; GFX10-DL-NEXT: s_and_b32 s4, s1, 15			; GFX10-DL-NEXT: s_and_b32 s4, s1, 15
	; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004
	; GFX10-DL-NEXT: s_waitcnt vmcnt(0)			; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2			; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2
	; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008			; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_acc8(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_acc8(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_acc8:			; GFX7-LABEL: udot8_acc8:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_lshr_b32 s2, s0, 28			; GFX7-NEXT: s_lshr_b32 s2, s0, 28
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010
	; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c			; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008			; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008
	; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004			; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004
	; GFX7-NEXT: s_lshr_b32 s14, s1, 28			; GFX7-NEXT: s_lshr_b32 s14, s1, 28
	; GFX7-NEXT: s_and_b32 s1, s1, 15			; GFX7-NEXT: s_and_b32 s1, s1, 15
				; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010
	; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004
	; GFX7-NEXT: s_and_b32 s0, s0, 15			; GFX7-NEXT: s_and_b32 s0, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s20			; GFX7-NEXT: v_mov_b32_e32 v2, s20
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: udot8_acc8:			; GFX10-DL-LABEL: udot8_acc8:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_and_b32 s2, s0, 15			; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
	; GFX10-DL-NEXT: s_and_b32 s4, s1, 15			; GFX10-DL-NEXT: s_and_b32 s4, s1, 15
	; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004
	; GFX10-DL-NEXT: s_waitcnt vmcnt(0)			; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2			; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2
	; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008			; GFX10-DL-NEXT: s_bfe_u32 s2, s0, 0x40008
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_acc4(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_acc4(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_acc4:			; GFX7-LABEL: udot8_acc4:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_lshr_b32 s2, s0, 28			; GFX7-NEXT: s_lshr_b32 s2, s0, 28
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010
	; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c			; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008			; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008
	; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004			; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004
	; GFX7-NEXT: s_lshr_b32 s14, s1, 28			; GFX7-NEXT: s_lshr_b32 s14, s1, 28
	; GFX7-NEXT: s_and_b32 s1, s1, 15			; GFX7-NEXT: s_and_b32 s1, s1, 15
				; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010
	; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004
	; GFX7-NEXT: s_and_b32 s0, s0, 15			; GFX7-NEXT: s_and_b32 s0, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s20			; GFX7-NEXT: v_mov_b32_e32 v2, s20
	▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: udot8_acc4:			; GFX10-DL-LABEL: udot8_acc4:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_and_b32 s2, s0, 15			; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
	; GFX10-DL-NEXT: s_and_b32 s4, s1, 15			; GFX10-DL-NEXT: s_and_b32 s4, s1, 15
	; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008			; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008
	; GFX10-DL-NEXT: s_waitcnt vmcnt(0)			; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2			; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_CommutationInsideMAD(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_CommutationInsideMAD(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_CommutationInsideMAD:			; GFX7-LABEL: udot8_CommutationInsideMAD:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_lshr_b32 s2, s0, 28			; GFX7-NEXT: s_lshr_b32 s2, s0, 28
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010
	; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c			; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008			; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008
	; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004			; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004
	; GFX7-NEXT: s_lshr_b32 s14, s1, 28			; GFX7-NEXT: s_lshr_b32 s14, s1, 28
	; GFX7-NEXT: s_and_b32 s1, s1, 15			; GFX7-NEXT: s_and_b32 s1, s1, 15
				; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010
	; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004
	; GFX7-NEXT: s_and_b32 s0, s0, 15			; GFX7-NEXT: s_and_b32 s0, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s20			; GFX7-NEXT: v_mov_b32_e32 v2, s20
	▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: udot8_CommutationInsideMAD:			; GFX10-DL-LABEL: udot8_CommutationInsideMAD:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_and_b32 s2, s0, 15			; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
	; GFX10-DL-NEXT: s_and_b32 s4, s1, 15			; GFX10-DL-NEXT: s_and_b32 s4, s1, 15
	; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008			; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008
	; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x4000c			; GFX10-DL-NEXT: s_bfe_u32 s8, s1, 0x4000c
	; GFX10-DL-NEXT: s_waitcnt vmcnt(0)			; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_multiuses_mul1(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_multiuses_mul1:			; GFX7-LABEL: udot8_multiuses_mul1:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0			; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_lshr_b32 s1, s0, 28
	; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004			; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004
	; GFX7-NEXT: s_lshr_b32 s11, s10, 28			; GFX7-NEXT: s_lshr_b32 s11, s10, 28
	; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010
	; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c			; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008			; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008
	; GFX7-NEXT: s_and_b32 s10, s10, 15			; GFX7-NEXT: s_and_b32 s10, s10, 15
				; GFX7-NEXT: s_lshr_b32 s1, s0, 28
	; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018			; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010
	; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008
	; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004
	; GFX7-NEXT: s_and_b32 s0, s0, 15			; GFX7-NEXT: s_and_b32 s0, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v0, s10			; GFX7-NEXT: v_mov_b32_e32 v0, s10
	Show All 18 Lines
	; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: udot8_multiuses_mul1:			; GFX8-LABEL: udot8_multiuses_mul1:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0			; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_lshr_b32 s4, s2, 28
	; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004			; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004
	; GFX8-NEXT: s_lshr_b32 s7, s6, 28			; GFX8-NEXT: s_lshr_b32 s7, s6, 28
	; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018			; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018
	; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014			; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014
	; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010			; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010
	; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c			; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c
	; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008			; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008
	; GFX8-NEXT: s_and_b32 s6, s6, 15			; GFX8-NEXT: s_and_b32 s6, s6, 15
				; GFX8-NEXT: s_lshr_b32 s4, s2, 28
	; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018			; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018
	; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014			; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014
	; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010			; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010
	; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c			; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c
	; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008			; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008
	; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004			; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004
	; GFX8-NEXT: s_and_b32 s2, s2, 15			; GFX8-NEXT: s_and_b32 s2, s2, 15
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	Show All 20 Lines
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: udot8_multiuses_mul1:			; GFX9-LABEL: udot8_multiuses_mul1:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0			; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshr_b32 s4, s2, 28
	; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004			; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004
	; GFX9-NEXT: s_lshr_b32 s7, s6, 28			; GFX9-NEXT: s_lshr_b32 s7, s6, 28
	; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018			; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018
	; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014			; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014
	; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010			; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010
	; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c			; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c
	; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008			; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008
	; GFX9-NEXT: s_and_b32 s6, s6, 15			; GFX9-NEXT: s_and_b32 s6, s6, 15
				; GFX9-NEXT: s_lshr_b32 s4, s2, 28
	; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018			; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018
	; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014			; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014
	; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010			; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010
	; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c			; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c
	; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008			; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008
	; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004			; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004
	; GFX9-NEXT: s_and_b32 s2, s2, 15			; GFX9-NEXT: s_and_b32 s2, s2, 15
	; GFX9-NEXT: v_mov_b32_e32 v0, s6			; GFX9-NEXT: v_mov_b32_e32 v0, s6
	Show All 20 Lines
	; GFX9-NEXT: global_store_dword v[0:1], v2, off			; GFX9-NEXT: global_store_dword v[0:1], v2, off
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX9-DL-LABEL: udot8_multiuses_mul1:			; GFX9-DL-LABEL: udot8_multiuses_mul1:
	; GFX9-DL: ; %bb.0: ; %entry			; GFX9-DL: ; %bb.0: ; %entry
	; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX9-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX9-DL-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX9-DL-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-DL-NEXT: s_load_dword s19, s[0:1], 0x0			; GFX9-DL-NEXT: s_load_dword s19, s[0:1], 0x0
	; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-DL-NEXT: s_lshr_b32 s4, s2, 28
	; GFX9-DL-NEXT: s_bfe_u32 s18, s6, 0x40004			; GFX9-DL-NEXT: s_bfe_u32 s18, s6, 0x40004
	; GFX9-DL-NEXT: s_lshr_b32 s7, s6, 28			; GFX9-DL-NEXT: s_lshr_b32 s7, s6, 28
	; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40018			; GFX9-DL-NEXT: s_bfe_u32 s13, s6, 0x40018
	; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40014			; GFX9-DL-NEXT: s_bfe_u32 s14, s6, 0x40014
	; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40010			; GFX9-DL-NEXT: s_bfe_u32 s15, s6, 0x40010
	; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x4000c			; GFX9-DL-NEXT: s_bfe_u32 s16, s6, 0x4000c
	; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x40008			; GFX9-DL-NEXT: s_bfe_u32 s17, s6, 0x40008
	; GFX9-DL-NEXT: s_and_b32 s6, s6, 15			; GFX9-DL-NEXT: s_and_b32 s6, s6, 15
				; GFX9-DL-NEXT: s_lshr_b32 s4, s2, 28
	; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40018			; GFX9-DL-NEXT: s_bfe_u32 s5, s2, 0x40018
	; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40014			; GFX9-DL-NEXT: s_bfe_u32 s8, s2, 0x40014
	; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40010			; GFX9-DL-NEXT: s_bfe_u32 s9, s2, 0x40010
	; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x4000c			; GFX9-DL-NEXT: s_bfe_u32 s10, s2, 0x4000c
	; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x40008			; GFX9-DL-NEXT: s_bfe_u32 s11, s2, 0x40008
	; GFX9-DL-NEXT: s_bfe_u32 s12, s2, 0x40004			; GFX9-DL-NEXT: s_bfe_u32 s12, s2, 0x40004
	; GFX9-DL-NEXT: s_and_b32 s2, s2, 15			; GFX9-DL-NEXT: s_and_b32 s2, s2, 15
	; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6			; GFX9-DL-NEXT: v_mov_b32_e32 v0, s6
	▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_acc32_vecMul(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_acc32_vecMul:			; GFX7-LABEL: udot8_acc32_vecMul:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s10, s[10:11], 0x0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0			; GFX7-NEXT: s_load_dword s21, s[4:5], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_lshr_b32 s1, s0, 28
	; GFX7-NEXT: s_lshr_b32 s11, s10, 28			; GFX7-NEXT: s_lshr_b32 s11, s10, 28
	; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s10, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s10, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s10, 0x40010
	; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c			; GFX7-NEXT: s_bfe_u32 s18, s10, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008			; GFX7-NEXT: s_bfe_u32 s19, s10, 0x40008
	; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004			; GFX7-NEXT: s_bfe_u32 s20, s10, 0x40004
	; GFX7-NEXT: s_and_b32 s10, s10, 15			; GFX7-NEXT: s_and_b32 s10, s10, 15
				; GFX7-NEXT: s_lshr_b32 s1, s0, 28
	; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018			; GFX7-NEXT: s_bfe_u32 s2, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40010
	; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s12, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40008
	; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s14, s0, 0x40004
	; GFX7-NEXT: s_and_b32 s0, s0, 15			; GFX7-NEXT: s_and_b32 s0, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v0, s10			; GFX7-NEXT: v_mov_b32_e32 v0, s10
	Show All 16 Lines
	; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0			; GFX7-NEXT: buffer_store_dword v0, off, s[4:7], 0
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GFX8-LABEL: udot8_acc32_vecMul:			; GFX8-LABEL: udot8_acc32_vecMul:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX8-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX8-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX8-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX8-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0			; GFX8-NEXT: s_load_dword s19, s[0:1], 0x0
	; GFX8-NEXT: s_waitcnt lgkmcnt(0)			; GFX8-NEXT: s_waitcnt lgkmcnt(0)
	; GFX8-NEXT: s_lshr_b32 s4, s2, 28
	; GFX8-NEXT: s_lshr_b32 s7, s6, 28			; GFX8-NEXT: s_lshr_b32 s7, s6, 28
	; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018			; GFX8-NEXT: s_bfe_u32 s13, s6, 0x40018
	; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014			; GFX8-NEXT: s_bfe_u32 s14, s6, 0x40014
	; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010			; GFX8-NEXT: s_bfe_u32 s15, s6, 0x40010
	; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c			; GFX8-NEXT: s_bfe_u32 s16, s6, 0x4000c
	; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008			; GFX8-NEXT: s_bfe_u32 s17, s6, 0x40008
	; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004			; GFX8-NEXT: s_bfe_u32 s18, s6, 0x40004
	; GFX8-NEXT: s_and_b32 s6, s6, 15			; GFX8-NEXT: s_and_b32 s6, s6, 15
				; GFX8-NEXT: s_lshr_b32 s4, s2, 28
	; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018			; GFX8-NEXT: s_bfe_u32 s5, s2, 0x40018
	; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014			; GFX8-NEXT: s_bfe_u32 s8, s2, 0x40014
	; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010			; GFX8-NEXT: s_bfe_u32 s9, s2, 0x40010
	; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c			; GFX8-NEXT: s_bfe_u32 s10, s2, 0x4000c
	; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008			; GFX8-NEXT: s_bfe_u32 s11, s2, 0x40008
	; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004			; GFX8-NEXT: s_bfe_u32 s12, s2, 0x40004
	; GFX8-NEXT: s_and_b32 s2, s2, 15			; GFX8-NEXT: s_and_b32 s2, s2, 15
	; GFX8-NEXT: v_mov_b32_e32 v0, s6			; GFX8-NEXT: v_mov_b32_e32 v0, s6
	Show All 18 Lines
	; GFX8-NEXT: flat_store_dword v[0:1], v2			; GFX8-NEXT: flat_store_dword v[0:1], v2
	; GFX8-NEXT: s_endpgm			; GFX8-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: udot8_acc32_vecMul:			; GFX9-LABEL: udot8_acc32_vecMul:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0			; GFX9-NEXT: s_load_dword s6, s[6:7], 0x0
				; GFX9-NEXT: s_load_dword s2, s[4:5], 0x0
	; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0			; GFX9-NEXT: s_load_dword s19, s[0:1], 0x0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshr_b32 s4, s2, 28
	; GFX9-NEXT: s_lshr_b32 s7, s6, 28			; GFX9-NEXT: s_lshr_b32 s7, s6, 28
	; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018			; GFX9-NEXT: s_bfe_u32 s13, s6, 0x40018
	; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014			; GFX9-NEXT: s_bfe_u32 s14, s6, 0x40014
	; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010			; GFX9-NEXT: s_bfe_u32 s15, s6, 0x40010
	; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c			; GFX9-NEXT: s_bfe_u32 s16, s6, 0x4000c
	; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008			; GFX9-NEXT: s_bfe_u32 s17, s6, 0x40008
	; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004			; GFX9-NEXT: s_bfe_u32 s18, s6, 0x40004
	; GFX9-NEXT: s_and_b32 s6, s6, 15			; GFX9-NEXT: s_and_b32 s6, s6, 15
				; GFX9-NEXT: s_lshr_b32 s4, s2, 28
	; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018			; GFX9-NEXT: s_bfe_u32 s5, s2, 0x40018
	; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014			; GFX9-NEXT: s_bfe_u32 s8, s2, 0x40014
	; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010			; GFX9-NEXT: s_bfe_u32 s9, s2, 0x40010
	; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c			; GFX9-NEXT: s_bfe_u32 s10, s2, 0x4000c
	; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008			; GFX9-NEXT: s_bfe_u32 s11, s2, 0x40008
	; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004			; GFX9-NEXT: s_bfe_u32 s12, s2, 0x40004
	; GFX9-NEXT: s_and_b32 s2, s2, 15			; GFX9-NEXT: s_and_b32 s2, s2, 15
	; GFX9-NEXT: v_mov_b32_e32 v0, s6			; GFX9-NEXT: v_mov_b32_e32 v0, s6
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_acc16_vecMul(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_acc16_vecMul:			; GFX7-LABEL: udot8_acc16_vecMul:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ushort v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_u32 s11, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s11, s0, 0x40004
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s18, s1, 0x40004			; GFX7-NEXT: s_bfe_u32 s18, s1, 0x40004
	; GFX7-NEXT: s_bfe_u32 s20, s1, 0x4000c			; GFX7-NEXT: s_bfe_u32 s20, s1, 0x4000c
	; GFX7-NEXT: v_mov_b32_e32 v2, s20
	; GFX7-NEXT: v_mov_b32_e32 v4, s18			; GFX7-NEXT: v_mov_b32_e32 v4, s18
	; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010
	; GFX7-NEXT: s_and_b32 s19, s1, 15			; GFX7-NEXT: s_and_b32 s19, s1, 15
	; GFX7-NEXT: s_lshr_b32 s14, s1, 28			; GFX7-NEXT: s_lshr_b32 s14, s1, 28
	; GFX7-NEXT: s_bfe_u32 s1, s1, 0x40008			; GFX7-NEXT: s_bfe_u32 s1, s1, 0x40008
				; GFX7-NEXT: s_bfe_u32 s13, s0, 0x4000c
				; GFX7-NEXT: v_mov_b32_e32 v2, s20
	; GFX7-NEXT: v_mul_u32_u24_e32 v2, s13, v2			; GFX7-NEXT: v_mul_u32_u24_e32 v2, s13, v2
	; GFX7-NEXT: v_mul_u32_u24_e32 v4, s11, v4			; GFX7-NEXT: v_mul_u32_u24_e32 v4, s11, v4
	; GFX7-NEXT: s_lshr_b32 s2, s0, 28			; GFX7-NEXT: s_lshr_b32 s2, s0, 28
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018			; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010
	; GFX7-NEXT: s_and_b32 s12, s0, 15			; GFX7-NEXT: s_and_b32 s12, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v3, s19			; GFX7-NEXT: v_mov_b32_e32 v3, s19
	▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: udot8_acc16_vecMul:			; GFX10-DL-LABEL: udot8_acc16_vecMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ushort v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_and_b32 s2, s0, 15			; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
	; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s6, s0, 0x40004
	; GFX10-DL-NEXT: s_and_b32 s4, s1, 15			; GFX10-DL-NEXT: s_and_b32 s4, s1, 15
	; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x4000c			; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x4000c
	; GFX10-DL-NEXT: s_bfe_u32 s8, s0, 0x4000c			; GFX10-DL-NEXT: s_bfe_u32 s8, s0, 0x4000c
	; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s6			; GFX10-DL-NEXT: s_pack_ll_b32_b16 s2, s2, s6
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_acc8_vecMul(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_acc8_vecMul:			; GFX7-LABEL: udot8_acc8_vecMul:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_bfe_u32 s2, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s2, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40004
	; GFX7-NEXT: s_bfe_u32 s14, s1, 0x4000c			; GFX7-NEXT: s_bfe_u32 s14, s1, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40004			; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40004
	; GFX7-NEXT: s_lshr_b32 s18, s1, 28			; GFX7-NEXT: s_lshr_b32 s18, s1, 28
	; GFX7-NEXT: v_mov_b32_e32 v6, s16
	; GFX7-NEXT: v_mov_b32_e32 v8, s14			; GFX7-NEXT: v_mov_b32_e32 v8, s14
	; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40008			; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40008
	; GFX7-NEXT: s_and_b32 s17, s1, 15			; GFX7-NEXT: s_and_b32 s17, s1, 15
	; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40018			; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40018
	; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40014			; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40014
				; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40004
				; GFX7-NEXT: v_mov_b32_e32 v6, s16
	; GFX7-NEXT: s_lshr_b32 s11, s0, 28			; GFX7-NEXT: s_lshr_b32 s11, s0, 28
	; GFX7-NEXT: v_mov_b32_e32 v4, s18			; GFX7-NEXT: v_mov_b32_e32 v4, s18
	; GFX7-NEXT: v_mul_u32_u24_e32 v4, s11, v4			; GFX7-NEXT: v_mul_u32_u24_e32 v4, s11, v4
	; GFX7-NEXT: v_mul_u32_u24_e32 v6, s9, v6			; GFX7-NEXT: v_mul_u32_u24_e32 v6, s9, v6
	; GFX7-NEXT: v_mul_u32_u24_e32 v8, s2, v8			; GFX7-NEXT: v_mul_u32_u24_e32 v8, s2, v8
	; GFX7-NEXT: s_bfe_u32 s1, s1, 0x40010			; GFX7-NEXT: s_bfe_u32 s1, s1, 0x40010
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40008
	; GFX7-NEXT: v_mov_b32_e32 v7, s15			; GFX7-NEXT: v_mov_b32_e32 v7, s15
	▲ Show 20 Lines • Show All 246 Lines • ▼ Show 20 Lines
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff			; GFX10-DL-NEXT: s_mov_b32 s2, 0xffff
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s4, s0, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s5, s1, 0x40004
	; GFX10-DL-NEXT: s_and_b32 s6, s0, 15			; GFX10-DL-NEXT: s_and_b32 s6, s0, 15
	; GFX10-DL-NEXT: s_and_b32 s8, s1, 15			; GFX10-DL-NEXT: s_and_b32 s8, s1, 15
	; GFX10-DL-NEXT: s_bfe_u32 s7, s0, 0x4000c			; GFX10-DL-NEXT: s_bfe_u32 s7, s0, 0x4000c
	; GFX10-DL-NEXT: s_bfe_u32 s9, s1, 0x4000c			; GFX10-DL-NEXT: s_bfe_u32 s9, s1, 0x4000c
	; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, s4, s5			; GFX10-DL-NEXT: v_mul_lo_u16_e64 v3, s4, s5
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @udot8_acc4_vecMul(<8 x i4> addrspace(1)* %src1,			define amdgpu_kernel void @udot8_acc4_vecMul(<8 x i4> addrspace(1)* %src1,
	; GFX7-LABEL: udot8_acc4_vecMul:			; GFX7-LABEL: udot8_acc4_vecMul:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9			; GFX7-NEXT: s_load_dwordx4 s[8:11], s[0:1], 0x9
	; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd			; GFX7-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0xd
	; GFX7-NEXT: s_mov_b32 s7, 0xf000			; GFX7-NEXT: s_mov_b32 s7, 0xf000
	; GFX7-NEXT: s_mov_b32 s6, -1			; GFX7-NEXT: s_mov_b32 s6, -1
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0			; GFX7-NEXT: buffer_load_ubyte v0, off, s[4:7], 0
				; GFX7-NEXT: s_load_dword s0, s[8:9], 0x0
	; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0			; GFX7-NEXT: s_load_dword s1, s[10:11], 0x0
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: s_lshr_b32 s2, s0, 28			; GFX7-NEXT: s_lshr_b32 s2, s0, 28
	; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018			; GFX7-NEXT: s_bfe_u32 s15, s1, 0x40018
	; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014			; GFX7-NEXT: s_bfe_u32 s16, s1, 0x40014
	; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010			; GFX7-NEXT: s_bfe_u32 s17, s1, 0x40010
	; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c			; GFX7-NEXT: s_bfe_u32 s18, s1, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008			; GFX7-NEXT: s_bfe_u32 s19, s1, 0x40008
	; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004			; GFX7-NEXT: s_bfe_u32 s20, s1, 0x40004
	; GFX7-NEXT: s_lshr_b32 s14, s1, 28			; GFX7-NEXT: s_lshr_b32 s14, s1, 28
	; GFX7-NEXT: s_and_b32 s1, s1, 15			; GFX7-NEXT: s_and_b32 s1, s1, 15
				; GFX7-NEXT: s_bfe_u32 s8, s0, 0x40018
	; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014			; GFX7-NEXT: s_bfe_u32 s9, s0, 0x40014
	; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010			; GFX7-NEXT: s_bfe_u32 s10, s0, 0x40010
	; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c			; GFX7-NEXT: s_bfe_u32 s11, s0, 0x4000c
	; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008			; GFX7-NEXT: s_bfe_u32 s12, s0, 0x40008
	; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004			; GFX7-NEXT: s_bfe_u32 s13, s0, 0x40004
	; GFX7-NEXT: s_and_b32 s0, s0, 15			; GFX7-NEXT: s_and_b32 s0, s0, 15
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: v_mov_b32_e32 v2, s20			; GFX7-NEXT: v_mov_b32_e32 v2, s20
	▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
	; GFX10-DL-LABEL: udot8_acc4_vecMul:			; GFX10-DL-LABEL: udot8_acc4_vecMul:
	; GFX10-DL: ; %bb.0: ; %entry			; GFX10-DL: ; %bb.0: ; %entry
	; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-DL-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34			; GFX10-DL-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x34
	; GFX10-DL-NEXT: ; implicit-def: $vcc_hi			; GFX10-DL-NEXT: ; implicit-def: $vcc_hi
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0			; GFX10-DL-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1			; GFX10-DL-NEXT: v_mov_b32_e32 v1, s1
				; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0			; GFX10-DL-NEXT: s_load_dword s0, s[4:5], 0x0
	; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0			; GFX10-DL-NEXT: s_load_dword s1, s[6:7], 0x0
	; GFX10-DL-NEXT: global_load_ubyte v2, v[0:1], off
	; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-DL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-DL-NEXT: s_and_b32 s2, s0, 15			; GFX10-DL-NEXT: s_and_b32 s2, s0, 15
	; GFX10-DL-NEXT: s_and_b32 s4, s1, 15			; GFX10-DL-NEXT: s_and_b32 s4, s1, 15
	; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s5, s0, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004			; GFX10-DL-NEXT: s_bfe_u32 s6, s1, 0x40004
	; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008			; GFX10-DL-NEXT: s_bfe_u32 s7, s1, 0x40008
	; GFX10-DL-NEXT: s_waitcnt vmcnt(0)			; GFX10-DL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2			; GFX10-DL-NEXT: v_mad_u32_u24 v2, s2, s4, v2
	▲ Show 20 Lines • Show All 290 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll

	Show All 10 Lines
	; not just directly into the vector component?			; not just directly into the vector component?
	define amdgpu_kernel void @insertelement_v4f32_0(<4 x float> addrspace(1)* %out, <4 x float> %a) nounwind {			define amdgpu_kernel void @insertelement_v4f32_0(<4 x float> addrspace(1)* %out, <4 x float> %a) nounwind {
	; SI-LABEL: insertelement_v4f32_0:			; SI-LABEL: insertelement_v4f32_0:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; SI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; SI-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x4			; SI-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x4
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s4, 0x40a00000			; SI-NEXT: s_mov_b32 s4, 0x40a00000
	; SI-NEXT: v_mov_b32_e32 v0, s4
	; SI-NEXT: s_mov_b32 s3, 0x100f000			; SI-NEXT: s_mov_b32 s3, 0x100f000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
				; SI-NEXT: v_mov_b32_e32 v0, s4
	; SI-NEXT: v_mov_b32_e32 v1, s5			; SI-NEXT: v_mov_b32_e32 v1, s5
	; SI-NEXT: v_mov_b32_e32 v2, s6			; SI-NEXT: v_mov_b32_e32 v2, s6
	; SI-NEXT: v_mov_b32_e32 v3, s7			; SI-NEXT: v_mov_b32_e32 v3, s7
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: insertelement_v4f32_0:			; VI-LABEL: insertelement_v4f32_0:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; VI-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x10			; VI-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x10
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s4, 0x40a00000			; VI-NEXT: s_mov_b32 s4, 0x40a00000
	; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: s_mov_b32 s3, 0x1100f000			; VI-NEXT: s_mov_b32 s3, 0x1100f000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
				; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: v_mov_b32_e32 v1, s5			; VI-NEXT: v_mov_b32_e32 v1, s5
	; VI-NEXT: v_mov_b32_e32 v2, s6			; VI-NEXT: v_mov_b32_e32 v2, s6
	; VI-NEXT: v_mov_b32_e32 v3, s7			; VI-NEXT: v_mov_b32_e32 v3, s7
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%vecins = insertelement <4 x float> %a, float 5.000000e+00, i32 0			%vecins = insertelement <4 x float> %a, float 5.000000e+00, i32 0
	store <4 x float> %vecins, <4 x float> addrspace(1)* %out, align 16			store <4 x float> %vecins, <4 x float> addrspace(1)* %out, align 16
	ret void			ret void
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines

	define amdgpu_kernel void @insertelement_v4i32_0(<4 x i32> addrspace(1)* %out, <4 x i32> %a) nounwind {			define amdgpu_kernel void @insertelement_v4i32_0(<4 x i32> addrspace(1)* %out, <4 x i32> %a) nounwind {
	; SI-LABEL: insertelement_v4i32_0:			; SI-LABEL: insertelement_v4i32_0:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; SI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; SI-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x4			; SI-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x4
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_movk_i32 s4, 0x3e7			; SI-NEXT: s_movk_i32 s4, 0x3e7
	; SI-NEXT: v_mov_b32_e32 v0, s4
	; SI-NEXT: s_mov_b32 s3, 0x100f000			; SI-NEXT: s_mov_b32 s3, 0x100f000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
				; SI-NEXT: v_mov_b32_e32 v0, s4
	; SI-NEXT: v_mov_b32_e32 v1, s5			; SI-NEXT: v_mov_b32_e32 v1, s5
	; SI-NEXT: v_mov_b32_e32 v2, s6			; SI-NEXT: v_mov_b32_e32 v2, s6
	; SI-NEXT: v_mov_b32_e32 v3, s7			; SI-NEXT: v_mov_b32_e32 v3, s7
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: insertelement_v4i32_0:			; VI-LABEL: insertelement_v4i32_0:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; VI-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x10			; VI-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x10
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_movk_i32 s4, 0x3e7			; VI-NEXT: s_movk_i32 s4, 0x3e7
	; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: s_mov_b32 s3, 0x1100f000			; VI-NEXT: s_mov_b32 s3, 0x1100f000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
				; VI-NEXT: v_mov_b32_e32 v0, s4
	; VI-NEXT: v_mov_b32_e32 v1, s5			; VI-NEXT: v_mov_b32_e32 v1, s5
	; VI-NEXT: v_mov_b32_e32 v2, s6			; VI-NEXT: v_mov_b32_e32 v2, s6
	; VI-NEXT: v_mov_b32_e32 v3, s7			; VI-NEXT: v_mov_b32_e32 v3, s7
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%vecins = insertelement <4 x i32> %a, i32 999, i32 0			%vecins = insertelement <4 x i32> %a, i32 999, i32 0
	store <4 x i32> %vecins, <4 x i32> addrspace(1)* %out, align 16			store <4 x i32> %vecins, <4 x i32> addrspace(1)* %out, align 16
	ret void			ret void
	▲ Show 20 Lines • Show All 315 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_kernel void @dynamic_insertelement_v16f32(<16 x float> addrspace(1)* %out, <16 x float> %a, i32 %b) nounwind {			define amdgpu_kernel void @dynamic_insertelement_v16f32(<16 x float> addrspace(1)* %out, <16 x float> %a, i32 %b) nounwind {
	; SI-LABEL: dynamic_insertelement_v16f32:			; SI-LABEL: dynamic_insertelement_v16f32:
	; SI: ; %bb.0:			; SI: ; %bb.0:
	; SI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; SI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; SI-NEXT: s_load_dwordx16 s[8:23], s[4:5], 0x10			; SI-NEXT: s_load_dwordx16 s[8:23], s[4:5], 0x10
	; SI-NEXT: s_load_dword s4, s[4:5], 0x20			; SI-NEXT: s_load_dword s4, s[4:5], 0x20
				; SI-NEXT: v_mov_b32_e32 v16, 0x40a00000
	; SI-NEXT: s_mov_b32 s3, 0x100f000			; SI-NEXT: s_mov_b32 s3, 0x100f000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: v_mov_b32_e32 v16, 0x40a00000
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: v_mov_b32_e32 v0, s8			; SI-NEXT: v_mov_b32_e32 v0, s8
	; SI-NEXT: v_mov_b32_e32 v1, s9			; SI-NEXT: v_mov_b32_e32 v1, s9
	; SI-NEXT: v_mov_b32_e32 v2, s10			; SI-NEXT: v_mov_b32_e32 v2, s10
	; SI-NEXT: v_mov_b32_e32 v3, s11			; SI-NEXT: v_mov_b32_e32 v3, s11
	; SI-NEXT: v_mov_b32_e32 v4, s12			; SI-NEXT: v_mov_b32_e32 v4, s12
	; SI-NEXT: v_mov_b32_e32 v5, s13			; SI-NEXT: v_mov_b32_e32 v5, s13
	; SI-NEXT: v_mov_b32_e32 v6, s14			; SI-NEXT: v_mov_b32_e32 v6, s14
	Show All 14 Lines
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: dynamic_insertelement_v16f32:			; VI-LABEL: dynamic_insertelement_v16f32:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; VI-NEXT: s_load_dwordx16 s[8:23], s[4:5], 0x40			; VI-NEXT: s_load_dwordx16 s[8:23], s[4:5], 0x40
	; VI-NEXT: s_load_dword s4, s[4:5], 0x80			; VI-NEXT: s_load_dword s4, s[4:5], 0x80
				; VI-NEXT: v_mov_b32_e32 v16, 0x40a00000
	; VI-NEXT: s_mov_b32 s3, 0x1100f000			; VI-NEXT: s_mov_b32 s3, 0x1100f000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: v_mov_b32_e32 v16, 0x40a00000
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s8			; VI-NEXT: v_mov_b32_e32 v0, s8
	; VI-NEXT: v_mov_b32_e32 v1, s9			; VI-NEXT: v_mov_b32_e32 v1, s9
	; VI-NEXT: v_mov_b32_e32 v2, s10			; VI-NEXT: v_mov_b32_e32 v2, s10
	; VI-NEXT: v_mov_b32_e32 v3, s11			; VI-NEXT: v_mov_b32_e32 v3, s11
	; VI-NEXT: v_mov_b32_e32 v4, s12			; VI-NEXT: v_mov_b32_e32 v4, s12
	; VI-NEXT: v_mov_b32_e32 v5, s13			; VI-NEXT: v_mov_b32_e32 v5, s13
	; VI-NEXT: v_mov_b32_e32 v6, s14			; VI-NEXT: v_mov_b32_e32 v6, s14
	▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0			; VI-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x0
	; VI-NEXT: s_load_dwordx4 s[8:11], s[4:5], 0x10			; VI-NEXT: s_load_dwordx4 s[8:11], s[4:5], 0x10
	; VI-NEXT: s_load_dword s6, s[4:5], 0x20			; VI-NEXT: s_load_dword s6, s[4:5], 0x20
	; VI-NEXT: s_load_dword s4, s[4:5], 0x44			; VI-NEXT: s_load_dword s4, s[4:5], 0x44
	; VI-NEXT: s_mov_b32 s3, 0x1100f000			; VI-NEXT: s_mov_b32 s3, 0x1100f000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s11			; VI-NEXT: v_mov_b32_e32 v0, s11
	; VI-NEXT: v_cmp_eq_u32_e64 vcc, s6, 3
	; VI-NEXT: v_mov_b32_e32 v4, s4			; VI-NEXT: v_mov_b32_e32 v4, s4
				; VI-NEXT: v_cmp_eq_u32_e64 vcc, s6, 3
	; VI-NEXT: v_cndmask_b32_e32 v3, v0, v4, vcc			; VI-NEXT: v_cndmask_b32_e32 v3, v0, v4, vcc
	; VI-NEXT: v_mov_b32_e32 v0, s10			; VI-NEXT: v_mov_b32_e32 v0, s10
	; VI-NEXT: v_cmp_eq_u32_e64 vcc, s6, 2			; VI-NEXT: v_cmp_eq_u32_e64 vcc, s6, 2
	; VI-NEXT: v_cndmask_b32_e32 v2, v0, v4, vcc			; VI-NEXT: v_cndmask_b32_e32 v2, v0, v4, vcc
	; VI-NEXT: v_mov_b32_e32 v0, s9			; VI-NEXT: v_mov_b32_e32 v0, s9
	; VI-NEXT: v_cmp_eq_u32_e64 vcc, s6, 1			; VI-NEXT: v_cmp_eq_u32_e64 vcc, s6, 1
	; VI-NEXT: v_cndmask_b32_e32 v1, v0, v4, vcc			; VI-NEXT: v_cndmask_b32_e32 v1, v0, v4, vcc
	; VI-NEXT: v_mov_b32_e32 v0, s8			; VI-NEXT: v_mov_b32_e32 v0, s8
	▲ Show 20 Lines • Show All 920 Lines • ▼ Show 20 Lines
	; SI-NEXT: v_mov_b32_e32 v16, 64			; SI-NEXT: v_mov_b32_e32 v16, 64
	; SI-NEXT: s_mov_b32 s11, 0x100f000			; SI-NEXT: s_mov_b32 s11, 0x100f000
	; SI-NEXT: s_mov_b32 s10, -1			; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: v_mov_b32_e32 v0, s12			; SI-NEXT: v_mov_b32_e32 v0, s12
	; SI-NEXT: s_and_b32 s4, s4, 7			; SI-NEXT: s_and_b32 s4, s4, 7
	; SI-NEXT: s_lshl_b32 s4, s4, 3			; SI-NEXT: s_lshl_b32 s4, s4, 3
	; SI-NEXT: v_mov_b32_e32 v1, s13			; SI-NEXT: v_mov_b32_e32 v1, s13
				; SI-NEXT: v_mov_b32_e32 v12, s24
				; SI-NEXT: v_mov_b32_e32 v13, s25
				; SI-NEXT: v_mov_b32_e32 v14, s26
				; SI-NEXT: v_mov_b32_e32 v15, s27
	; SI-NEXT: v_mov_b32_e32 v2, s14			; SI-NEXT: v_mov_b32_e32 v2, s14
	; SI-NEXT: v_mov_b32_e32 v3, s15			; SI-NEXT: v_mov_b32_e32 v3, s15
	; SI-NEXT: v_mov_b32_e32 v4, s16			; SI-NEXT: v_mov_b32_e32 v4, s16
	; SI-NEXT: v_mov_b32_e32 v5, s17			; SI-NEXT: v_mov_b32_e32 v5, s17
	; SI-NEXT: v_mov_b32_e32 v6, s18			; SI-NEXT: v_mov_b32_e32 v6, s18
	; SI-NEXT: v_mov_b32_e32 v7, s19			; SI-NEXT: v_mov_b32_e32 v7, s19
	; SI-NEXT: v_mov_b32_e32 v8, s20			; SI-NEXT: v_mov_b32_e32 v8, s20
	; SI-NEXT: v_mov_b32_e32 v9, s21			; SI-NEXT: v_mov_b32_e32 v9, s21
	; SI-NEXT: v_mov_b32_e32 v10, s22			; SI-NEXT: v_mov_b32_e32 v10, s22
	; SI-NEXT: v_mov_b32_e32 v11, s23			; SI-NEXT: v_mov_b32_e32 v11, s23
	; SI-NEXT: v_mov_b32_e32 v12, s24
	; SI-NEXT: v_mov_b32_e32 v13, s25
	; SI-NEXT: v_mov_b32_e32 v14, s26
	; SI-NEXT: v_mov_b32_e32 v15, s27
	; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; SI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], s7 offset:112
	; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], s7 offset:96
	; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; SI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], s7 offset:80
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], s7 offset:64
	; SI-NEXT: v_or_b32_e32 v16, s4, v16			; SI-NEXT: v_or_b32_e32 v16, s4, v16
	; SI-NEXT: v_mov_b32_e32 v0, 0			; SI-NEXT: v_mov_b32_e32 v0, 0
	; SI-NEXT: v_mov_b32_e32 v1, 0x40200000			; SI-NEXT: v_mov_b32_e32 v1, 0x40200000
	; SI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], s7 offen			; SI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], s7 offen
	Show All 16 Lines
	; VI-NEXT: v_mov_b32_e32 v16, 64			; VI-NEXT: v_mov_b32_e32 v16, 64
	; VI-NEXT: s_mov_b32 s11, 0x1100f000			; VI-NEXT: s_mov_b32 s11, 0x1100f000
	; VI-NEXT: s_mov_b32 s10, -1			; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v0, s12			; VI-NEXT: v_mov_b32_e32 v0, s12
	; VI-NEXT: s_and_b32 s4, s4, 7			; VI-NEXT: s_and_b32 s4, s4, 7
	; VI-NEXT: s_lshl_b32 s4, s4, 3			; VI-NEXT: s_lshl_b32 s4, s4, 3
	; VI-NEXT: v_mov_b32_e32 v1, s13			; VI-NEXT: v_mov_b32_e32 v1, s13
				; VI-NEXT: v_mov_b32_e32 v12, s24
				; VI-NEXT: v_mov_b32_e32 v13, s25
				; VI-NEXT: v_mov_b32_e32 v14, s26
				; VI-NEXT: v_mov_b32_e32 v15, s27
	; VI-NEXT: v_mov_b32_e32 v2, s14			; VI-NEXT: v_mov_b32_e32 v2, s14
	; VI-NEXT: v_mov_b32_e32 v3, s15			; VI-NEXT: v_mov_b32_e32 v3, s15
	; VI-NEXT: v_mov_b32_e32 v4, s16			; VI-NEXT: v_mov_b32_e32 v4, s16
	; VI-NEXT: v_mov_b32_e32 v5, s17			; VI-NEXT: v_mov_b32_e32 v5, s17
	; VI-NEXT: v_mov_b32_e32 v6, s18			; VI-NEXT: v_mov_b32_e32 v6, s18
	; VI-NEXT: v_mov_b32_e32 v7, s19			; VI-NEXT: v_mov_b32_e32 v7, s19
	; VI-NEXT: v_mov_b32_e32 v8, s20			; VI-NEXT: v_mov_b32_e32 v8, s20
	; VI-NEXT: v_mov_b32_e32 v9, s21			; VI-NEXT: v_mov_b32_e32 v9, s21
	; VI-NEXT: v_mov_b32_e32 v10, s22			; VI-NEXT: v_mov_b32_e32 v10, s22
	; VI-NEXT: v_mov_b32_e32 v11, s23			; VI-NEXT: v_mov_b32_e32 v11, s23
	; VI-NEXT: v_mov_b32_e32 v12, s24
	; VI-NEXT: v_mov_b32_e32 v13, s25
	; VI-NEXT: v_mov_b32_e32 v14, s26
	; VI-NEXT: v_mov_b32_e32 v15, s27
	; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], s7 offset:112			; VI-NEXT: buffer_store_dwordx4 v[12:15], off, s[0:3], s7 offset:112
	; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], s7 offset:96			; VI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], s7 offset:96
	; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], s7 offset:80			; VI-NEXT: buffer_store_dwordx4 v[4:7], off, s[0:3], s7 offset:80
	; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], s7 offset:64			; VI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], s7 offset:64
	; VI-NEXT: v_or_b32_e32 v16, s4, v16			; VI-NEXT: v_or_b32_e32 v16, s4, v16
	; VI-NEXT: v_mov_b32_e32 v0, 0			; VI-NEXT: v_mov_b32_e32 v0, 0
	; VI-NEXT: v_mov_b32_e32 v1, 0x40200000			; VI-NEXT: v_mov_b32_e32 v1, 0x40200000
	; VI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], s7 offen			; VI-NEXT: buffer_store_dwordx2 v[0:1], v16, s[0:3], s7 offen
	Show All 19 Lines

llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll

Show First 20 Lines • Show All 573 Lines • ▼ Show 20 Lines	; CI-NEXT: s_endpgm
store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out.gep		store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @v_insertelement_v2i16_0_reghi(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, i32 %elt.arg) #0 {		define amdgpu_kernel void @v_insertelement_v2i16_0_reghi(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, i32 %elt.arg) #0 {
; GFX9-LABEL: v_insertelement_v2i16_0_reghi:		; GFX9-LABEL: v_insertelement_v2i16_0_reghi:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
		foadUnsubmitted Not Done Reply Inline Actions Nice. foad: Nice.
; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff0000		; GFX9-NEXT: v_mov_b32_e32 v3, 0xffff0000
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dword v4, v[0:1], off		; GFX9-NEXT: global_load_dword v4, v[0:1], off
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: v_lshrrev_b32_e64 v2, 16, s4		; GFX9-NEXT: v_lshrrev_b32_e64 v2, 16, s4
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_and_or_b32 v2, v4, v3, v2		; GFX9-NEXT: v_and_or_b32 v2, v4, v3, v2
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v2i16_0_reghi:		; VI-LABEL: v_insertelement_v2i16_0_reghi:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_load_dword s4, s[4:5], 0x10		; VI-NEXT: s_load_dword s4, s[4:5], 0x10
		; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dword v3, v[0:1]		; VI-NEXT: flat_load_dword v3, v[0:1]
; VI-NEXT: v_mov_b32_e32 v1, s1		; VI-NEXT: v_mov_b32_e32 v1, s1
; VI-NEXT: v_add_u32_e32 v0, vcc, s0, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s0, v2
; VI-NEXT: s_lshr_b32 s1, s4, 16		; VI-NEXT: s_lshr_b32 s1, s4, 16
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v3		; VI-NEXT: v_and_b32_e32 v2, 0xffff0000, v3
; VI-NEXT: v_or_b32_e32 v2, s1, v2		; VI-NEXT: v_or_b32_e32 v2, s1, v2
; VI-NEXT: flat_store_dword v[0:1], v2		; VI-NEXT: flat_store_dword v[0:1], v2
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v2i16_0_reghi:		; CI-LABEL: v_insertelement_v2i16_0_reghi:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; CI-NEXT: s_load_dword s4, s[4:5], 0x4		; CI-NEXT: s_load_dword s4, s[4:5], 0x4
		; CI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dword v3, v[0:1]		; CI-NEXT: flat_load_dword v3, v[0:1]
; CI-NEXT: v_mov_b32_e32 v1, s1		; CI-NEXT: v_mov_b32_e32 v1, s1
; CI-NEXT: v_add_i32_e32 v0, vcc, s0, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s0, v2
; CI-NEXT: s_lshr_b32 s1, s4, 16		; CI-NEXT: s_lshr_b32 s1, s4, 16
▲ Show 20 Lines • Show All 528 Lines • ▼ Show 20 Lines	; CI-NEXT: s_endpgm
store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out		store <2 x i16> %vecins, <2 x i16> addrspace(1)* %out
ret void		ret void
}		}

define amdgpu_kernel void @v_insertelement_v2i16_dynamic_sgpr(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, i32 %idx) #0 {		define amdgpu_kernel void @v_insertelement_v2i16_dynamic_sgpr(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, i32 %idx) #0 {
; GFX9-LABEL: v_insertelement_v2i16_dynamic_sgpr:		; GFX9-LABEL: v_insertelement_v2i16_dynamic_sgpr:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: v_mov_b32_e32 v3, 0x3e703e7		; GFX9-NEXT: v_mov_b32_e32 v3, 0x3e703e7
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dword v4, v[0:1], off		; GFX9-NEXT: global_load_dword v4, v[0:1], off
; GFX9-NEXT: s_lshl_b32 s2, s4, 4		; GFX9-NEXT: s_lshl_b32 s2, s4, 4
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v2
; GFX9-NEXT: v_mov_b32_e32 v1, s1		; GFX9-NEXT: v_mov_b32_e32 v1, s1
; GFX9-NEXT: s_lshl_b32 s0, 0xffff, s2		; GFX9-NEXT: s_lshl_b32 s0, 0xffff, s2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_bfi_b32 v2, s0, v3, v4		; GFX9-NEXT: v_bfi_b32 v2, s0, v3, v4
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v2i16_dynamic_sgpr:		; VI-LABEL: v_insertelement_v2i16_dynamic_sgpr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_load_dword s4, s[4:5], 0x10		; VI-NEXT: s_load_dword s4, s[4:5], 0x10
		; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: v_mov_b32_e32 v3, 0x3e703e7		; VI-NEXT: v_mov_b32_e32 v3, 0x3e703e7
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dword v4, v[0:1]		; VI-NEXT: flat_load_dword v4, v[0:1]
; VI-NEXT: s_lshl_b32 s2, s4, 4		; VI-NEXT: s_lshl_b32 s2, s4, 4
; VI-NEXT: v_add_u32_e32 v0, vcc, s0, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s0, v2
; VI-NEXT: v_mov_b32_e32 v1, s1		; VI-NEXT: v_mov_b32_e32 v1, s1
; VI-NEXT: s_lshl_b32 s0, 0xffff, s2		; VI-NEXT: s_lshl_b32 s0, 0xffff, s2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; VI-NEXT: v_bfi_b32 v2, s0, v3, v4		; VI-NEXT: v_bfi_b32 v2, s0, v3, v4
; VI-NEXT: flat_store_dword v[0:1], v2		; VI-NEXT: flat_store_dword v[0:1], v2
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v2i16_dynamic_sgpr:		; CI-LABEL: v_insertelement_v2i16_dynamic_sgpr:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; CI-NEXT: s_load_dword s4, s[4:5], 0x4		; CI-NEXT: s_load_dword s4, s[4:5], 0x4
		; CI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; CI-NEXT: v_mov_b32_e32 v3, 0x3e703e7		; CI-NEXT: v_mov_b32_e32 v3, 0x3e703e7
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dword v4, v[0:1]		; CI-NEXT: flat_load_dword v4, v[0:1]
; CI-NEXT: s_lshl_b32 s2, s4, 4		; CI-NEXT: s_lshl_b32 s2, s4, 4
; CI-NEXT: v_add_i32_e32 v0, vcc, s0, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s0, v2
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	; CI-NEXT: s_endpgm
store <2 x half> %vecins, <2 x half> addrspace(1)* %out.gep		store <2 x half> %vecins, <2 x half> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @v_insertelement_v4f16_0(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, [8 x i32], i32 %val) #0 {		define amdgpu_kernel void @v_insertelement_v4f16_0(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, [8 x i32], i32 %val) #0 {
; GFX9-LABEL: v_insertelement_v4f16_0:		; GFX9-LABEL: v_insertelement_v4f16_0:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x30		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x30
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff		; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2
; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_bfi_b32 v0, v4, s4, v0		; GFX9-NEXT: v_bfi_b32 v0, v4, s4, v0
; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off		; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v4f16_0:		; VI-LABEL: v_insertelement_v4f16_0:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_load_dword s4, s[4:5], 0x30		; VI-NEXT: s_load_dword s4, s[4:5], 0x30
		; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; VI-NEXT: v_mov_b32_e32 v3, s1		; VI-NEXT: v_mov_b32_e32 v3, s1
; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2		; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2
; VI-NEXT: s_and_b32 s1, s4, 0xffff		; VI-NEXT: s_and_b32 s1, s4, 0xffff
; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc		; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0		; VI-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
; VI-NEXT: v_or_b32_e32 v0, s1, v0		; VI-NEXT: v_or_b32_e32 v0, s1, v0
; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v4f16_0:		; CI-LABEL: v_insertelement_v4f16_0:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_load_dword s4, s[4:5], 0xc		; CI-NEXT: s_load_dword s4, s[4:5], 0xc
		; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; CI-NEXT: v_mov_b32_e32 v3, s1		; CI-NEXT: v_mov_b32_e32 v3, s1
; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2		; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2
; CI-NEXT: s_and_b32 s1, s4, 0xffff		; CI-NEXT: s_and_b32 s1, s4, 0xffff
Show All 14 Lines	; CI-NEXT: s_endpgm
store <4 x half> %vecins, <4 x half> addrspace(1)* %out.gep		store <4 x half> %vecins, <4 x half> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @v_insertelement_v4f16_1(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val) #0 {		define amdgpu_kernel void @v_insertelement_v4f16_1(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val) #0 {
; GFX9-LABEL: v_insertelement_v4f16_1:		; GFX9-LABEL: v_insertelement_v4f16_1:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2
; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0		; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
; GFX9-NEXT: v_lshl_or_b32 v0, s4, 16, v0		; GFX9-NEXT: v_lshl_or_b32 v0, s4, 16, v0
; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off		; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v4f16_1:		; VI-LABEL: v_insertelement_v4f16_1:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_load_dword s4, s[4:5], 0x10		; VI-NEXT: s_load_dword s4, s[4:5], 0x10
		; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; VI-NEXT: s_lshl_b32 s2, s4, 16		; VI-NEXT: s_lshl_b32 s2, s4, 16
; VI-NEXT: v_mov_b32_e32 v4, s2		; VI-NEXT: v_mov_b32_e32 v4, s2
; VI-NEXT: v_mov_b32_e32 v3, s1		; VI-NEXT: v_mov_b32_e32 v3, s1
; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2		; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2
; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc		; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; VI-NEXT: v_or_b32_sdwa v0, v4, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0		; VI-NEXT: v_or_b32_sdwa v0, v4, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v4f16_1:		; CI-LABEL: v_insertelement_v4f16_1:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_load_dword s4, s[4:5], 0x4		; CI-NEXT: s_load_dword s4, s[4:5], 0x4
		; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; CI-NEXT: v_mov_b32_e32 v3, s1		; CI-NEXT: v_mov_b32_e32 v3, s1
; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2		; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2
; CI-NEXT: s_lshl_b32 s1, s4, 16		; CI-NEXT: s_lshl_b32 s1, s4, 16
Show All 14 Lines	; CI-NEXT: s_endpgm
store <4 x half> %vecins, <4 x half> addrspace(1)* %out.gep		store <4 x half> %vecins, <4 x half> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @v_insertelement_v4f16_2(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, [8 x i32], i32 %val) #0 {		define amdgpu_kernel void @v_insertelement_v4f16_2(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, [8 x i32], i32 %val) #0 {
; GFX9-LABEL: v_insertelement_v4f16_2:		; GFX9-LABEL: v_insertelement_v4f16_2:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x30		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x30
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff		; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2
; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_bfi_b32 v1, v4, s4, v1		; GFX9-NEXT: v_bfi_b32 v1, v4, s4, v1
; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off		; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v4f16_2:		; VI-LABEL: v_insertelement_v4f16_2:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_load_dword s4, s[4:5], 0x30		; VI-NEXT: s_load_dword s4, s[4:5], 0x30
		; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; VI-NEXT: v_mov_b32_e32 v3, s1		; VI-NEXT: v_mov_b32_e32 v3, s1
; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2		; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2
; VI-NEXT: s_and_b32 s1, s4, 0xffff		; VI-NEXT: s_and_b32 s1, s4, 0xffff
; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc		; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; VI-NEXT: v_or_b32_e32 v1, s1, v1		; VI-NEXT: v_or_b32_e32 v1, s1, v1
; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v4f16_2:		; CI-LABEL: v_insertelement_v4f16_2:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_load_dword s4, s[4:5], 0xc		; CI-NEXT: s_load_dword s4, s[4:5], 0xc
		; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; CI-NEXT: v_mov_b32_e32 v3, s1		; CI-NEXT: v_mov_b32_e32 v3, s1
; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2		; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2
; CI-NEXT: s_and_b32 s1, s4, 0xffff		; CI-NEXT: s_and_b32 s1, s4, 0xffff
Show All 14 Lines	; CI-NEXT: s_endpgm
store <4 x half> %vecins, <4 x half> addrspace(1)* %out.gep		store <4 x half> %vecins, <4 x half> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @v_insertelement_v4f16_3(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val) #0 {		define amdgpu_kernel void @v_insertelement_v4f16_3(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val) #0 {
; GFX9-LABEL: v_insertelement_v4f16_3:		; GFX9-LABEL: v_insertelement_v4f16_3:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2
; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1		; GFX9-NEXT: v_and_b32_e32 v1, 0xffff, v1
; GFX9-NEXT: v_lshl_or_b32 v1, s4, 16, v1		; GFX9-NEXT: v_lshl_or_b32 v1, s4, 16, v1
; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off		; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v4f16_3:		; VI-LABEL: v_insertelement_v4f16_3:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_load_dword s4, s[4:5], 0x10		; VI-NEXT: s_load_dword s4, s[4:5], 0x10
		; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; VI-NEXT: s_lshl_b32 s2, s4, 16		; VI-NEXT: s_lshl_b32 s2, s4, 16
; VI-NEXT: v_mov_b32_e32 v4, s2		; VI-NEXT: v_mov_b32_e32 v4, s2
; VI-NEXT: v_mov_b32_e32 v3, s1		; VI-NEXT: v_mov_b32_e32 v3, s1
; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2		; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2
; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc		; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; VI-NEXT: v_or_b32_sdwa v1, v4, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0		; VI-NEXT: v_or_b32_sdwa v1, v4, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v4f16_3:		; CI-LABEL: v_insertelement_v4f16_3:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_load_dword s4, s[4:5], 0x4		; CI-NEXT: s_load_dword s4, s[4:5], 0x4
		; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; CI-NEXT: v_mov_b32_e32 v3, s1		; CI-NEXT: v_mov_b32_e32 v3, s1
; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2		; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2
; CI-NEXT: s_lshl_b32 s1, s4, 16		; CI-NEXT: s_lshl_b32 s1, s4, 16
Show All 14 Lines	; CI-NEXT: s_endpgm
store <4 x half> %vecins, <4 x half> addrspace(1)* %out.gep		store <4 x half> %vecins, <4 x half> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @v_insertelement_v4i16_2(<4 x i16> addrspace(1)* %out, <4 x i16> addrspace(1)* %in, i32 %val) #0 {		define amdgpu_kernel void @v_insertelement_v4i16_2(<4 x i16> addrspace(1)* %out, <4 x i16> addrspace(1)* %in, i32 %val) #0 {
; GFX9-LABEL: v_insertelement_v4i16_2:		; GFX9-LABEL: v_insertelement_v4i16_2:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x10
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff		; GFX9-NEXT: v_mov_b32_e32 v4, 0xffff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2
; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_bfi_b32 v1, v4, s4, v1		; GFX9-NEXT: v_bfi_b32 v1, v4, s4, v1
; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off		; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v4i16_2:		; VI-LABEL: v_insertelement_v4i16_2:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_load_dword s4, s[4:5], 0x10		; VI-NEXT: s_load_dword s4, s[4:5], 0x10
		; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; VI-NEXT: v_mov_b32_e32 v3, s1		; VI-NEXT: v_mov_b32_e32 v3, s1
; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2		; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2
; VI-NEXT: s_and_b32 s1, s4, 0xffff		; VI-NEXT: s_and_b32 s1, s4, 0xffff
; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc		; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; VI-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; VI-NEXT: v_or_b32_e32 v1, s1, v1		; VI-NEXT: v_or_b32_e32 v1, s1, v1
; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v4i16_2:		; CI-LABEL: v_insertelement_v4i16_2:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_load_dword s4, s[4:5], 0x4		; CI-NEXT: s_load_dword s4, s[4:5], 0x4
		; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; CI-NEXT: v_mov_b32_e32 v3, s1		; CI-NEXT: v_mov_b32_e32 v3, s1
; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2		; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2
; CI-NEXT: s_and_b32 s1, s4, 0xffff		; CI-NEXT: s_and_b32 s1, s4, 0xffff
Show All 15 Lines	; CI-NEXT: s_endpgm
ret void		ret void
}		}

; FIXME: Better code on CI?		; FIXME: Better code on CI?
define amdgpu_kernel void @v_insertelement_v4i16_dynamic_vgpr(<4 x i16> addrspace(1)* %out, <4 x i16> addrspace(1)* %in, i32 %val) #0 {		define amdgpu_kernel void @v_insertelement_v4i16_dynamic_vgpr(<4 x i16> addrspace(1)* %out, <4 x i16> addrspace(1)* %in, i32 %val) #0 {
; GFX9-LABEL: v_insertelement_v4i16_dynamic_vgpr:		; GFX9-LABEL: v_insertelement_v4i16_dynamic_vgpr:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
		; GFX9-NEXT: s_load_dword s6, s[4:5], 0x10
; GFX9-NEXT: global_load_dword v4, v[0:1], off		; GFX9-NEXT: global_load_dword v4, v[0:1], off
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_load_dword s6, s[4:5], 0x10
; GFX9-NEXT: s_mov_b32 s5, 0		; GFX9-NEXT: s_mov_b32 s5, 0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: s_mov_b32 s4, 0xffff		; GFX9-NEXT: s_mov_b32 s4, 0xffff
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2
; GFX9-NEXT: s_pack_ll_b32_b16 s1, s6, s6		; GFX9-NEXT: s_pack_ll_b32_b16 s1, s6, s6
; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
; GFX9-NEXT: s_waitcnt vmcnt(1)		; GFX9-NEXT: s_waitcnt vmcnt(1)
; GFX9-NEXT: v_lshlrev_b32_e32 v4, 4, v4		; GFX9-NEXT: v_lshlrev_b32_e32 v4, 4, v4
; GFX9-NEXT: v_lshlrev_b64 v[4:5], v4, s[4:5]		; GFX9-NEXT: v_lshlrev_b64 v[4:5], v4, s[4:5]
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_bfi_b32 v1, v5, s1, v1		; GFX9-NEXT: v_bfi_b32 v1, v5, s1, v1
; GFX9-NEXT: v_bfi_b32 v0, v4, s1, v0		; GFX9-NEXT: v_bfi_b32 v0, v4, s1, v0
; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off		; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v4i16_dynamic_vgpr:		; VI-LABEL: v_insertelement_v4i16_dynamic_vgpr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
		; VI-NEXT: s_load_dword s6, s[4:5], 0x10
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: flat_load_dword v4, v[0:1]		; VI-NEXT: flat_load_dword v4, v[0:1]
; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0		; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_load_dword s6, s[4:5], 0x10
; VI-NEXT: s_mov_b32 s4, 0xffff		; VI-NEXT: s_mov_b32 s4, 0xffff
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; VI-NEXT: v_mov_b32_e32 v3, s1		; VI-NEXT: v_mov_b32_e32 v3, s1
; VI-NEXT: s_mov_b32 s5, 0		; VI-NEXT: s_mov_b32 s5, 0
; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: s_and_b32 s1, s6, s4		; VI-NEXT: s_and_b32 s1, s6, s4
; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2		; VI-NEXT: v_add_u32_e32 v2, vcc, s0, v2
; VI-NEXT: s_lshl_b32 s0, s1, 16		; VI-NEXT: s_lshl_b32 s0, s1, 16
; VI-NEXT: s_or_b32 s0, s1, s0		; VI-NEXT: s_or_b32 s0, s1, s0
; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc		; VI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
; VI-NEXT: s_waitcnt vmcnt(1)		; VI-NEXT: s_waitcnt vmcnt(1) lgkmcnt(1)
; VI-NEXT: v_lshlrev_b32_e32 v4, 4, v4		; VI-NEXT: v_lshlrev_b32_e32 v4, 4, v4
; VI-NEXT: v_lshlrev_b64 v[4:5], v4, s[4:5]		; VI-NEXT: v_lshlrev_b64 v[4:5], v4, s[4:5]
; VI-NEXT: s_waitcnt vmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; VI-NEXT: v_bfi_b32 v1, v5, s0, v1		; VI-NEXT: v_bfi_b32 v1, v5, s0, v1
; VI-NEXT: v_bfi_b32 v0, v4, s0, v0		; VI-NEXT: v_bfi_b32 v0, v4, s0, v0
; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v4i16_dynamic_vgpr:		; CI-LABEL: v_insertelement_v4i16_dynamic_vgpr:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
		; CI-NEXT: s_load_dword s6, s[4:5], 0x4
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: flat_load_dword v4, v[0:1]		; CI-NEXT: flat_load_dword v4, v[0:1]
; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0		; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_load_dword s6, s[4:5], 0x4
; CI-NEXT: s_mov_b32 s4, 0xffff		; CI-NEXT: s_mov_b32 s4, 0xffff
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; CI-NEXT: s_mov_b32 s5, 0		; CI-NEXT: s_mov_b32 s5, 0
; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: s_lshl_b32 s2, s6, 16		; CI-NEXT: s_lshl_b32 s2, s6, 16
; CI-NEXT: s_and_b32 s3, s6, s4		; CI-NEXT: s_and_b32 s3, s6, s4
; CI-NEXT: v_mov_b32_e32 v3, s1		; CI-NEXT: v_mov_b32_e32 v3, s1
; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2		; CI-NEXT: v_add_i32_e32 v2, vcc, s0, v2
; CI-NEXT: s_or_b32 s1, s3, s2		; CI-NEXT: s_or_b32 s1, s3, s2
; CI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc		; CI-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
; CI-NEXT: s_waitcnt vmcnt(1)		; CI-NEXT: s_waitcnt vmcnt(1) lgkmcnt(1)
; CI-NEXT: v_lshlrev_b32_e32 v4, 4, v4		; CI-NEXT: v_lshlrev_b32_e32 v4, 4, v4
; CI-NEXT: v_lshl_b64 v[4:5], s[4:5], v4		; CI-NEXT: v_lshl_b64 v[4:5], s[4:5], v4
; CI-NEXT: s_waitcnt vmcnt(0)		; CI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; CI-NEXT: v_bfi_b32 v1, v5, s1, v1		; CI-NEXT: v_bfi_b32 v1, v5, s1, v1
; CI-NEXT: v_bfi_b32 v0, v4, s1, v0		; CI-NEXT: v_bfi_b32 v0, v4, s1, v0
; CI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; CI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; CI-NEXT: s_endpgm		; CI-NEXT: s_endpgm
%tid = call i32 @llvm.amdgcn.workitem.id.x() #1		%tid = call i32 @llvm.amdgcn.workitem.id.x() #1
%tid.ext = sext i32 %tid to i64		%tid.ext = sext i32 %tid to i64
%in.gep = getelementptr inbounds <4 x i16>, <4 x i16> addrspace(1)* %in, i64 %tid.ext		%in.gep = getelementptr inbounds <4 x i16>, <4 x i16> addrspace(1)* %in, i64 %tid.ext
%out.gep = getelementptr inbounds <4 x i16>, <4 x i16> addrspace(1)* %out, i64 %tid.ext		%out.gep = getelementptr inbounds <4 x i16>, <4 x i16> addrspace(1)* %out, i64 %tid.ext
%idx.val = load volatile i32, i32 addrspace(1)* undef		%idx.val = load volatile i32, i32 addrspace(1)* undef
%vec = load <4 x i16>, <4 x i16> addrspace(1)* %in.gep		%vec = load <4 x i16>, <4 x i16> addrspace(1)* %in.gep
%val.trunc = trunc i32 %val to i16		%val.trunc = trunc i32 %val to i16
%val.cvt = bitcast i16 %val.trunc to i16		%val.cvt = bitcast i16 %val.trunc to i16
%vecins = insertelement <4 x i16> %vec, i16 %val.cvt, i32 %idx.val		%vecins = insertelement <4 x i16> %vec, i16 %val.cvt, i32 %idx.val
store <4 x i16> %vecins, <4 x i16> addrspace(1)* %out.gep		store <4 x i16> %vecins, <4 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @v_insertelement_v4f16_dynamic_sgpr(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val, i32 %idxval) #0 {		define amdgpu_kernel void @v_insertelement_v4f16_dynamic_sgpr(<4 x half> addrspace(1)* %out, <4 x half> addrspace(1)* %in, i32 %val, i32 %idxval) #0 {
; GFX9-LABEL: v_insertelement_v4f16_dynamic_sgpr:		; GFX9-LABEL: v_insertelement_v4f16_dynamic_sgpr:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x10		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x10
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; GFX9-NEXT: s_mov_b32 s7, 0		; GFX9-NEXT: s_mov_b32 s7, 0
; GFX9-NEXT: s_mov_b32 s6, 0xffff		; GFX9-NEXT: s_mov_b32 s6, 0xffff
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s3		; GFX9-NEXT: v_mov_b32_e32 v1, s3
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s2, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off		; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
; GFX9-NEXT: s_pack_ll_b32_b16 s3, s4, s4		; GFX9-NEXT: s_pack_ll_b32_b16 s3, s4, s4
; GFX9-NEXT: s_lshl_b32 s2, s5, 4		; GFX9-NEXT: s_lshl_b32 s2, s5, 4
; GFX9-NEXT: v_mov_b32_e32 v3, s1		; GFX9-NEXT: v_mov_b32_e32 v3, s1
; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2		; GFX9-NEXT: v_add_co_u32_e32 v2, vcc, s0, v2
; GFX9-NEXT: s_lshl_b64 s[0:1], s[6:7], s2		; GFX9-NEXT: s_lshl_b64 s[0:1], s[6:7], s2
; GFX9-NEXT: v_mov_b32_e32 v4, s3		; GFX9-NEXT: v_mov_b32_e32 v4, s3
; GFX9-NEXT: v_mov_b32_e32 v5, s3		; GFX9-NEXT: v_mov_b32_e32 v5, s3
; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_bfi_b32 v1, s1, v4, v1		; GFX9-NEXT: v_bfi_b32 v1, s1, v4, v1
; GFX9-NEXT: v_bfi_b32 v0, s0, v5, v0		; GFX9-NEXT: v_bfi_b32 v0, s0, v5, v0
; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off		; GFX9-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: v_insertelement_v4f16_dynamic_sgpr:		; VI-LABEL: v_insertelement_v4f16_dynamic_sgpr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; VI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x10		; VI-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x10
		; VI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; VI-NEXT: s_mov_b32 s6, 0xffff		; VI-NEXT: s_mov_b32 s6, 0xffff
; VI-NEXT: s_mov_b32 s7, 0		; VI-NEXT: s_mov_b32 s7, 0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s2, v2
; VI-NEXT: v_mov_b32_e32 v1, s3		; VI-NEXT: v_mov_b32_e32 v1, s3
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; VI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; VI-NEXT: s_and_b32 s2, s4, s6		; VI-NEXT: s_and_b32 s2, s4, s6
Show All 10 Lines
; VI-NEXT: v_bfi_b32 v1, s1, v4, v1		; VI-NEXT: v_bfi_b32 v1, s1, v4, v1
; VI-NEXT: v_bfi_b32 v0, s0, v5, v0		; VI-NEXT: v_bfi_b32 v0, s0, v5, v0
; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]		; VI-NEXT: flat_store_dwordx2 v[2:3], v[0:1]
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
;		;
; CI-LABEL: v_insertelement_v4f16_dynamic_sgpr:		; CI-LABEL: v_insertelement_v4f16_dynamic_sgpr:
; CI: ; %bb.0:		; CI: ; %bb.0:
; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0		; CI-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x4		; CI-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x4
		; CI-NEXT: v_lshlrev_b32_e32 v2, 3, v0
; CI-NEXT: s_mov_b32 s6, 0xffff		; CI-NEXT: s_mov_b32 s6, 0xffff
; CI-NEXT: s_mov_b32 s7, 0		; CI-NEXT: s_mov_b32 s7, 0
; CI-NEXT: s_waitcnt lgkmcnt(0)		; CI-NEXT: s_waitcnt lgkmcnt(0)
; CI-NEXT: v_mov_b32_e32 v1, s3		; CI-NEXT: v_mov_b32_e32 v1, s3
; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2		; CI-NEXT: v_add_i32_e32 v0, vcc, s2, v2
; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; CI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]		; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
; CI-NEXT: s_and_b32 s2, s4, s6		; CI-NEXT: s_and_b32 s2, s4, s6
Show All 30 Lines

llvm/test/CodeGen/AMDGPU/llvm.maxnum.f16.ll

	Show All 9 Lines

	define amdgpu_kernel void @maxnum_f16(			define amdgpu_kernel void @maxnum_f16(
	; SI-LABEL: maxnum_f16:			; SI-LABEL: maxnum_f16:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; SI-NEXT: s_mov_b32 s11, 0xf000			; SI-NEXT: s_mov_b32 s11, 0xf000
	; SI-NEXT: s_mov_b32 s10, -1			; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_mov_b32 s2, s10			; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s3, s11
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s12, s6			; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7			; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s15, s11			; SI-NEXT: s_mov_b32 s15, s11
				; SI-NEXT: s_mov_b32 s2, s10
				; SI-NEXT: s_mov_b32 s3, s11
	; SI-NEXT: buffer_load_ushort v0, off, s[12:15], 0			; SI-NEXT: buffer_load_ushort v0, off, s[12:15], 0
	; SI-NEXT: buffer_load_ushort v1, off, s[0:3], 0			; SI-NEXT: buffer_load_ushort v1, off, s[0:3], 0
	; SI-NEXT: s_mov_b32 s8, s4			; SI-NEXT: s_mov_b32 s8, s4
	; SI-NEXT: s_mov_b32 s9, s5			; SI-NEXT: s_mov_b32 s9, s5
	; SI-NEXT: s_waitcnt vmcnt(1)			; SI-NEXT: s_waitcnt vmcnt(1)
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_cvt_f32_f16_e32 v1, v1			; SI-NEXT: v_cvt_f32_f16_e32 v1, v1
	Show All 11 Lines
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_mov_b32 s10, s2			; VI-NEXT: s_mov_b32 s10, s2
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s0, s4			; VI-NEXT: s_mov_b32 s0, s4
	; VI-NEXT: s_mov_b32 s1, s5			; VI-NEXT: s_mov_b32 s1, s5
	; VI-NEXT: s_mov_b32 s4, s6			; VI-NEXT: s_mov_b32 s4, s6
	; VI-NEXT: s_mov_b32 s5, s7			; VI-NEXT: s_mov_b32 s5, s7
	; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
				; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; VI-NEXT: buffer_load_ushort v0, off, s[4:7], 0
	; VI-NEXT: buffer_load_ushort v1, off, s[8:11], 0			; VI-NEXT: buffer_load_ushort v1, off, s[8:11], 0
	; VI-NEXT: s_waitcnt vmcnt(1)			; VI-NEXT: s_waitcnt vmcnt(1)
	; VI-NEXT: v_max_f16_e32 v0, v0, v0			; VI-NEXT: v_max_f16_e32 v0, v0, v0
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_max_f16_e32 v1, v1, v1			; VI-NEXT: v_max_f16_e32 v1, v1, v1
	; VI-NEXT: v_max_f16_e32 v0, v0, v1			; VI-NEXT: v_max_f16_e32 v0, v0, v1
	; VI-NEXT: buffer_store_short v0, off, s[0:3], 0			; VI-NEXT: buffer_store_short v0, off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: maxnum_f16:			; GFX9-LABEL: maxnum_f16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_mov_b32 s10, s2			; GFX9-NEXT: s_mov_b32 s10, s2
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s0, s4			; GFX9-NEXT: s_mov_b32 s0, s4
	; GFX9-NEXT: s_mov_b32 s1, s5			; GFX9-NEXT: s_mov_b32 s1, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7			; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s11, s3
	; GFX9-NEXT: s_mov_b32 s6, s2			; GFX9-NEXT: s_mov_b32 s6, s2
	; GFX9-NEXT: s_mov_b32 s7, s3			; GFX9-NEXT: s_mov_b32 s7, s3
				; GFX9-NEXT: s_mov_b32 s11, s3
	; GFX9-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; GFX9-NEXT: buffer_load_ushort v0, off, s[4:7], 0
	; GFX9-NEXT: buffer_load_ushort v1, off, s[8:11], 0			; GFX9-NEXT: buffer_load_ushort v1, off, s[8:11], 0
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: v_max_f16_e32 v0, v0, v0			; GFX9-NEXT: v_max_f16_e32 v0, v0, v0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_max_f16_e32 v1, v1, v1			; GFX9-NEXT: v_max_f16_e32 v1, v1, v1
	; GFX9-NEXT: v_max_f16_e32 v0, v0, v1			; GFX9-NEXT: v_max_f16_e32 v0, v0, v1
	; GFX9-NEXT: buffer_store_short v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_short v0, off, s[0:3], 0
	▲ Show 20 Lines • Show All 367 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @maxnum_v3f16(			define amdgpu_kernel void @maxnum_v3f16(
	; SI-LABEL: maxnum_v3f16:			; SI-LABEL: maxnum_v3f16:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd			; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s3, 0xf000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
				; SI-NEXT: s_mov_b32 s0, s4
	; SI-NEXT: s_load_dwordx2 s[6:7], s[6:7], 0x0			; SI-NEXT: s_load_dwordx2 s[6:7], s[6:7], 0x0
	; SI-NEXT: s_load_dwordx2 s[8:9], s[8:9], 0x0			; SI-NEXT: s_load_dwordx2 s[8:9], s[8:9], 0x0
	; SI-NEXT: s_mov_b32 s0, s4
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_lshr_b32 s1, s6, 16			; SI-NEXT: s_lshr_b32 s1, s6, 16
	; SI-NEXT: s_lshr_b32 s4, s8, 16			; SI-NEXT: s_lshr_b32 s4, s8, 16
	; SI-NEXT: v_cvt_f32_f16_e32 v3, s1			; SI-NEXT: v_cvt_f32_f16_e32 v3, s1
	; SI-NEXT: v_cvt_f32_f16_e32 v2, s4			; SI-NEXT: v_cvt_f32_f16_e32 v2, s4
	; SI-NEXT: v_cvt_f32_f16_e32 v1, s6			; SI-NEXT: v_cvt_f32_f16_e32 v1, s6
	; SI-NEXT: v_cvt_f32_f16_e32 v5, s8			; SI-NEXT: v_cvt_f32_f16_e32 v5, s8
	; SI-NEXT: v_cvt_f32_f16_e32 v0, s7			; SI-NEXT: v_cvt_f32_f16_e32 v0, s7
	; SI-NEXT: v_cvt_f32_f16_e32 v4, s9			; SI-NEXT: v_cvt_f32_f16_e32 v4, s9
	; SI-NEXT: v_mul_f32_e32 v2, 1.0, v2			; SI-NEXT: v_mul_f32_e32 v2, 1.0, v2
	; SI-NEXT: v_mul_f32_e32 v3, 1.0, v3			; SI-NEXT: v_mul_f32_e32 v3, 1.0, v3
	; SI-NEXT: v_max_f32_e32 v2, v3, v2			; SI-NEXT: v_max_f32_e32 v2, v3, v2
	; SI-NEXT: v_mul_f32_e32 v3, 1.0, v5			; SI-NEXT: v_mul_f32_e32 v3, 1.0, v5
	; SI-NEXT: v_mul_f32_e32 v1, 1.0, v1			; SI-NEXT: v_mul_f32_e32 v1, 1.0, v1
	; SI-NEXT: v_max_f32_e32 v1, v1, v3			; SI-NEXT: v_max_f32_e32 v1, v1, v3
	; SI-NEXT: v_cvt_f16_f32_e32 v2, v2
	; SI-NEXT: v_mul_f32_e32 v3, 1.0, v4			; SI-NEXT: v_mul_f32_e32 v3, 1.0, v4
	; SI-NEXT: v_mul_f32_e32 v0, 1.0, v0			; SI-NEXT: v_mul_f32_e32 v0, 1.0, v0
				; SI-NEXT: v_cvt_f16_f32_e32 v2, v2
	; SI-NEXT: v_max_f32_e32 v0, v0, v3			; SI-NEXT: v_max_f32_e32 v0, v0, v3
	; SI-NEXT: v_cvt_f16_f32_e32 v1, v1			; SI-NEXT: v_cvt_f16_f32_e32 v1, v1
	; SI-NEXT: v_cvt_f16_f32_e32 v0, v0			; SI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; SI-NEXT: v_lshlrev_b32_e32 v2, 16, v2			; SI-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; SI-NEXT: s_mov_b32 s1, s5			; SI-NEXT: s_mov_b32 s1, s5
	; SI-NEXT: v_or_b32_e32 v1, v1, v2			; SI-NEXT: v_or_b32_e32 v1, v1, v2
	; SI-NEXT: buffer_store_short v0, off, s[0:3], 0 offset:4			; SI-NEXT: buffer_store_short v0, off, s[0:3], 0 offset:4
	; SI-NEXT: buffer_store_dword v1, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v1, off, s[0:3], 0
	▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.minnum.f16.ll

	Show All 9 Lines

	define amdgpu_kernel void @minnum_f16_ieee(			define amdgpu_kernel void @minnum_f16_ieee(
	; SI-LABEL: minnum_f16_ieee:			; SI-LABEL: minnum_f16_ieee:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xd
	; SI-NEXT: s_mov_b32 s11, 0xf000			; SI-NEXT: s_mov_b32 s11, 0xf000
	; SI-NEXT: s_mov_b32 s10, -1			; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_mov_b32 s2, s10			; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s3, s11
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s12, s6			; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7			; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s15, s11			; SI-NEXT: s_mov_b32 s15, s11
				; SI-NEXT: s_mov_b32 s2, s10
				; SI-NEXT: s_mov_b32 s3, s11
	; SI-NEXT: buffer_load_ushort v0, off, s[12:15], 0			; SI-NEXT: buffer_load_ushort v0, off, s[12:15], 0
	; SI-NEXT: buffer_load_ushort v1, off, s[0:3], 0			; SI-NEXT: buffer_load_ushort v1, off, s[0:3], 0
	; SI-NEXT: s_mov_b32 s8, s4			; SI-NEXT: s_mov_b32 s8, s4
	; SI-NEXT: s_mov_b32 s9, s5			; SI-NEXT: s_mov_b32 s9, s5
	; SI-NEXT: s_waitcnt vmcnt(1)			; SI-NEXT: s_waitcnt vmcnt(1)
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_cvt_f32_f16_e32 v1, v1			; SI-NEXT: v_cvt_f32_f16_e32 v1, v1
	Show All 11 Lines
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_mov_b32 s10, s2			; VI-NEXT: s_mov_b32 s10, s2
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s0, s4			; VI-NEXT: s_mov_b32 s0, s4
	; VI-NEXT: s_mov_b32 s1, s5			; VI-NEXT: s_mov_b32 s1, s5
	; VI-NEXT: s_mov_b32 s4, s6			; VI-NEXT: s_mov_b32 s4, s6
	; VI-NEXT: s_mov_b32 s5, s7			; VI-NEXT: s_mov_b32 s5, s7
	; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
				; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; VI-NEXT: buffer_load_ushort v0, off, s[4:7], 0
	; VI-NEXT: buffer_load_ushort v1, off, s[8:11], 0			; VI-NEXT: buffer_load_ushort v1, off, s[8:11], 0
	; VI-NEXT: s_waitcnt vmcnt(1)			; VI-NEXT: s_waitcnt vmcnt(1)
	; VI-NEXT: v_max_f16_e32 v0, v0, v0			; VI-NEXT: v_max_f16_e32 v0, v0, v0
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_max_f16_e32 v1, v1, v1			; VI-NEXT: v_max_f16_e32 v1, v1, v1
	; VI-NEXT: v_min_f16_e32 v0, v0, v1			; VI-NEXT: v_min_f16_e32 v0, v0, v1
	; VI-NEXT: buffer_store_short v0, off, s[0:3], 0			; VI-NEXT: buffer_store_short v0, off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: minnum_f16_ieee:			; GFX9-LABEL: minnum_f16_ieee:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	; GFX9-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34			; GFX9-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x34
	; GFX9-NEXT: s_mov_b32 s3, 0xf000			; GFX9-NEXT: s_mov_b32 s3, 0xf000
	; GFX9-NEXT: s_mov_b32 s2, -1			; GFX9-NEXT: s_mov_b32 s2, -1
	; GFX9-NEXT: s_mov_b32 s10, s2			; GFX9-NEXT: s_mov_b32 s10, s2
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s0, s4			; GFX9-NEXT: s_mov_b32 s0, s4
	; GFX9-NEXT: s_mov_b32 s1, s5			; GFX9-NEXT: s_mov_b32 s1, s5
	; GFX9-NEXT: s_mov_b32 s4, s6			; GFX9-NEXT: s_mov_b32 s4, s6
	; GFX9-NEXT: s_mov_b32 s5, s7			; GFX9-NEXT: s_mov_b32 s5, s7
	; GFX9-NEXT: s_mov_b32 s11, s3
	; GFX9-NEXT: s_mov_b32 s6, s2			; GFX9-NEXT: s_mov_b32 s6, s2
	; GFX9-NEXT: s_mov_b32 s7, s3			; GFX9-NEXT: s_mov_b32 s7, s3
				; GFX9-NEXT: s_mov_b32 s11, s3
	; GFX9-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; GFX9-NEXT: buffer_load_ushort v0, off, s[4:7], 0
	; GFX9-NEXT: buffer_load_ushort v1, off, s[8:11], 0			; GFX9-NEXT: buffer_load_ushort v1, off, s[8:11], 0
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: v_max_f16_e32 v0, v0, v0			; GFX9-NEXT: v_max_f16_e32 v0, v0, v0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_max_f16_e32 v1, v1, v1			; GFX9-NEXT: v_max_f16_e32 v1, v1, v1
	; GFX9-NEXT: v_min_f16_e32 v0, v0, v1			; GFX9-NEXT: v_min_f16_e32 v0, v0, v1
	; GFX9-NEXT: buffer_store_short v0, off, s[0:3], 0			; GFX9-NEXT: buffer_store_short v0, off, s[0:3], 0
	▲ Show 20 Lines • Show All 420 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @minnum_v3f16(			define amdgpu_kernel void @minnum_v3f16(
	; SI-LABEL: minnum_v3f16:			; SI-LABEL: minnum_v3f16:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd			; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0xd
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s3, 0xf000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
				; SI-NEXT: s_mov_b32 s0, s4
	; SI-NEXT: s_load_dwordx2 s[6:7], s[6:7], 0x0			; SI-NEXT: s_load_dwordx2 s[6:7], s[6:7], 0x0
	; SI-NEXT: s_load_dwordx2 s[8:9], s[8:9], 0x0			; SI-NEXT: s_load_dwordx2 s[8:9], s[8:9], 0x0
	; SI-NEXT: s_mov_b32 s0, s4
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_lshr_b32 s1, s6, 16			; SI-NEXT: s_lshr_b32 s1, s6, 16
	; SI-NEXT: s_lshr_b32 s4, s8, 16			; SI-NEXT: s_lshr_b32 s4, s8, 16
	; SI-NEXT: v_cvt_f32_f16_e32 v3, s1			; SI-NEXT: v_cvt_f32_f16_e32 v3, s1
	; SI-NEXT: v_cvt_f32_f16_e32 v2, s4			; SI-NEXT: v_cvt_f32_f16_e32 v2, s4
	; SI-NEXT: v_cvt_f32_f16_e32 v1, s6			; SI-NEXT: v_cvt_f32_f16_e32 v1, s6
	; SI-NEXT: v_cvt_f32_f16_e32 v5, s8			; SI-NEXT: v_cvt_f32_f16_e32 v5, s8
	; SI-NEXT: v_cvt_f32_f16_e32 v0, s7			; SI-NEXT: v_cvt_f32_f16_e32 v0, s7
	; SI-NEXT: v_cvt_f32_f16_e32 v4, s9			; SI-NEXT: v_cvt_f32_f16_e32 v4, s9
	; SI-NEXT: v_mul_f32_e32 v2, 1.0, v2			; SI-NEXT: v_mul_f32_e32 v2, 1.0, v2
	; SI-NEXT: v_mul_f32_e32 v3, 1.0, v3			; SI-NEXT: v_mul_f32_e32 v3, 1.0, v3
	; SI-NEXT: v_min_f32_e32 v2, v3, v2			; SI-NEXT: v_min_f32_e32 v2, v3, v2
	; SI-NEXT: v_mul_f32_e32 v3, 1.0, v5			; SI-NEXT: v_mul_f32_e32 v3, 1.0, v5
	; SI-NEXT: v_mul_f32_e32 v1, 1.0, v1			; SI-NEXT: v_mul_f32_e32 v1, 1.0, v1
	; SI-NEXT: v_min_f32_e32 v1, v1, v3			; SI-NEXT: v_min_f32_e32 v1, v1, v3
	; SI-NEXT: v_cvt_f16_f32_e32 v2, v2
	; SI-NEXT: v_mul_f32_e32 v3, 1.0, v4			; SI-NEXT: v_mul_f32_e32 v3, 1.0, v4
	; SI-NEXT: v_mul_f32_e32 v0, 1.0, v0			; SI-NEXT: v_mul_f32_e32 v0, 1.0, v0
				; SI-NEXT: v_cvt_f16_f32_e32 v2, v2
	; SI-NEXT: v_min_f32_e32 v0, v0, v3			; SI-NEXT: v_min_f32_e32 v0, v0, v3
	; SI-NEXT: v_cvt_f16_f32_e32 v1, v1			; SI-NEXT: v_cvt_f16_f32_e32 v1, v1
	; SI-NEXT: v_cvt_f16_f32_e32 v0, v0			; SI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; SI-NEXT: v_lshlrev_b32_e32 v2, 16, v2			; SI-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; SI-NEXT: s_mov_b32 s1, s5			; SI-NEXT: s_mov_b32 s1, s5
	; SI-NEXT: v_or_b32_e32 v1, v1, v2			; SI-NEXT: v_or_b32_e32 v1, v1, v2
	; SI-NEXT: buffer_store_short v0, off, s[0:3], 0 offset:4			; SI-NEXT: buffer_store_short v0, off, s[0:3], 0 offset:4
	; SI-NEXT: buffer_store_dword v1, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v1, off, s[0:3], 0
	▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll

	Show First 20 Lines • Show All 421 Lines • ▼ Show 20 Lines
	; CI-NEXT: v_mov_b32_e32 v6, 0			; CI-NEXT: v_mov_b32_e32 v6, 0
	; CI-NEXT: v_cndmask_b32_e32 v7, 0, v10, vcc			; CI-NEXT: v_cndmask_b32_e32 v7, 0, v10, vcc
	; CI-NEXT: v_trunc_f64_e32 v[10:11], s[12:13]			; CI-NEXT: v_trunc_f64_e32 v[10:11], s[12:13]
	; CI-NEXT: v_add_f64 v[6:7], v[4:5], v[6:7]			; CI-NEXT: v_add_f64 v[6:7], v[4:5], v[6:7]
	; CI-NEXT: v_add_f64 v[4:5], s[12:13], -v[10:11]			; CI-NEXT: v_add_f64 v[4:5], s[12:13], -v[10:11]
	; CI-NEXT: v_mov_b32_e32 v13, s13			; CI-NEXT: v_mov_b32_e32 v13, s13
	; CI-NEXT: v_cmp_ge_f64_e64 vcc, \|v[4:5]\|, 0.5			; CI-NEXT: v_cmp_ge_f64_e64 vcc, \|v[4:5]\|, 0.5
	; CI-NEXT: v_bfi_b32 v12, s2, v12, v13			; CI-NEXT: v_bfi_b32 v12, s2, v12, v13
	; CI-NEXT: v_mov_b32_e32 v0, 0
	; CI-NEXT: v_cndmask_b32_e32 v5, 0, v12, vcc			; CI-NEXT: v_cndmask_b32_e32 v5, 0, v12, vcc
	; CI-NEXT: v_mov_b32_e32 v4, 0			; CI-NEXT: v_mov_b32_e32 v4, 0
				; CI-NEXT: v_mov_b32_e32 v0, 0
	; CI-NEXT: v_add_f64 v[4:5], v[10:11], v[4:5]			; CI-NEXT: v_add_f64 v[4:5], v[10:11], v[4:5]
	; CI-NEXT: v_add_f64 v[0:1], v[8:9], v[0:1]			; CI-NEXT: v_add_f64 v[0:1], v[8:9], v[0:1]
	; CI-NEXT: buffer_store_dwordx4 v[4:7], off, s[4:7], 0 offset:16			; CI-NEXT: buffer_store_dwordx4 v[4:7], off, s[4:7], 0 offset:16
	; CI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0			; CI-NEXT: buffer_store_dwordx4 v[0:3], off, s[4:7], 0
	; CI-NEXT: s_endpgm			; CI-NEXT: s_endpgm
	%result = call <4 x double> @llvm.round.v4f64(<4 x double> %in) #1			%result = call <4 x double> @llvm.round.v4f64(<4 x double> %in) #1
	store <4 x double> %result, <4 x double> addrspace(1)* %out			store <4 x double> %result, <4 x double> addrspace(1)* %out
	ret void			ret void
	▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/load-lo16.ll

	Show First 20 Lines • Show All 539 Lines • ▼ Show 20 Lines
	; GFX906-NEXT: global_store_dword v[0:1], v0, off			; GFX906-NEXT: global_store_dword v[0:1], v0, off
	; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX906-NEXT: s_setpc_b64 s[30:31]			; GFX906-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX803-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_lo:			; GFX803-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_lo:
	; GFX803: ; %bb.0: ; %entry			; GFX803: ; %bb.0: ; %entry
	; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX803-NEXT: s_mov_b32 m0, -1			; GFX803-NEXT: s_mov_b32 m0, -1
	; GFX803-NEXT: v_mov_b32_e32 v2, 0
	; GFX803-NEXT: ds_read_u16 v0, v0			; GFX803-NEXT: ds_read_u16 v0, v0
				; GFX803-NEXT: v_mov_b32_e32 v2, 0
	; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1			; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
	; GFX803-NEXT: s_waitcnt lgkmcnt(0)			; GFX803-NEXT: s_waitcnt lgkmcnt(0)
	; GFX803-NEXT: ds_write_b16 v2, v0			; GFX803-NEXT: ds_write_b16 v2, v0
	; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD			; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; GFX803-NEXT: flat_store_dword v[0:1], v0			; GFX803-NEXT: flat_store_dword v[0:1], v0
	; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX803-NEXT: s_setpc_b64 s[30:31]			; GFX803-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	Show All 31 Lines
	; GFX906-NEXT: global_store_dword v[0:1], v0, off			; GFX906-NEXT: global_store_dword v[0:1], v0, off
	; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX906-NEXT: s_setpc_b64 s[30:31]			; GFX906-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX803-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_hi:			; GFX803-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_hi:
	; GFX803: ; %bb.0: ; %entry			; GFX803: ; %bb.0: ; %entry
	; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX803-NEXT: s_mov_b32 m0, -1			; GFX803-NEXT: s_mov_b32 m0, -1
	; GFX803-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; GFX803-NEXT: v_mov_b32_e32 v3, 0
	; GFX803-NEXT: ds_read_u16 v0, v0			; GFX803-NEXT: ds_read_u16 v0, v0
				; GFX803-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; GFX803-NEXT: v_lshlrev_b32_e32 v2, 16, v1			; GFX803-NEXT: v_lshlrev_b32_e32 v2, 16, v1
				; GFX803-NEXT: v_mov_b32_e32 v3, 0
	; GFX803-NEXT: ds_write_b16 v3, v1			; GFX803-NEXT: ds_write_b16 v3, v1
	; GFX803-NEXT: s_waitcnt lgkmcnt(1)			; GFX803-NEXT: s_waitcnt lgkmcnt(1)
	; GFX803-NEXT: v_or_b32_e32 v0, v0, v2			; GFX803-NEXT: v_or_b32_e32 v0, v0, v2
	; GFX803-NEXT: flat_store_dword v[0:1], v0			; GFX803-NEXT: flat_store_dword v[0:1], v0
	; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX803-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX803-NEXT: s_setpc_b64 s[30:31]			; GFX803-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%load = load i16, i16 addrspace(3)* %in			%load = load i16, i16 addrspace(3)* %in
	%elt1 = extractelement <2 x i16> %reg, i32 1			%elt1 = extractelement <2 x i16> %reg, i32 1
	store i16 %elt1, i16 addrspace(3)* null			store i16 %elt1, i16 addrspace(3)* null
	%build1 = insertelement <2 x i16> %reg, i16 %load, i32 0			%build1 = insertelement <2 x i16> %reg, i16 %load, i32 0
	store <2 x i16> %build1, <2 x i16> addrspace(1)* undef			store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
	ret void			ret void
	}			}

	define void @load_local_lo_v2i16_reghi_vreg_multi_use_lohi(i16 addrspace(3)* noalias %in, <2 x i16> %reg, i16 addrspace(3)* noalias %out0, i16 addrspace(3)* noalias %out1) #0 {			define void @load_local_lo_v2i16_reghi_vreg_multi_use_lohi(i16 addrspace(3)* noalias %in, <2 x i16> %reg, i16 addrspace(3)* noalias %out0, i16 addrspace(3)* noalias %out1) #0 {
	; GFX900-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_lohi:			; GFX900-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_lohi:
	; GFX900: ; %bb.0: ; %entry			; GFX900: ; %bb.0: ; %entry
	; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX900-NEXT: ds_read_u16 v0, v0			; GFX900-NEXT: ds_read_u16 v0, v0
	; GFX900-NEXT: v_lshrrev_b32_e32 v5, 16, v1
	; GFX900-NEXT: v_mov_b32_e32 v4, 0xffff			; GFX900-NEXT: v_mov_b32_e32 v4, 0xffff
				; GFX900-NEXT: v_lshrrev_b32_e32 v5, 16, v1
	; GFX900-NEXT: s_waitcnt lgkmcnt(0)			; GFX900-NEXT: s_waitcnt lgkmcnt(0)
	; GFX900-NEXT: ds_write_b16 v2, v0			; GFX900-NEXT: ds_write_b16 v2, v0
	; GFX900-NEXT: ds_write_b16 v3, v5			; GFX900-NEXT: ds_write_b16 v3, v5
	; GFX900-NEXT: v_bfi_b32 v0, v4, v0, v1			; GFX900-NEXT: v_bfi_b32 v0, v4, v0, v1
	; GFX900-NEXT: global_store_dword v[0:1], v0, off			; GFX900-NEXT: global_store_dword v[0:1], v0, off
	; GFX900-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX900-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX900-NEXT: s_setpc_b64 s[30:31]			; GFX900-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX906-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_lohi:			; GFX906-LABEL: load_local_lo_v2i16_reghi_vreg_multi_use_lohi:
	; GFX906: ; %bb.0: ; %entry			; GFX906: ; %bb.0: ; %entry
	; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX906-NEXT: ds_read_u16 v0, v0			; GFX906-NEXT: ds_read_u16 v0, v0
	; GFX906-NEXT: v_lshrrev_b32_e32 v5, 16, v1
	; GFX906-NEXT: v_mov_b32_e32 v4, 0xffff			; GFX906-NEXT: v_mov_b32_e32 v4, 0xffff
				; GFX906-NEXT: v_lshrrev_b32_e32 v5, 16, v1
	; GFX906-NEXT: s_waitcnt lgkmcnt(0)			; GFX906-NEXT: s_waitcnt lgkmcnt(0)
	; GFX906-NEXT: ds_write_b16 v2, v0			; GFX906-NEXT: ds_write_b16 v2, v0
	; GFX906-NEXT: ds_write_b16 v3, v5			; GFX906-NEXT: ds_write_b16 v3, v5
	; GFX906-NEXT: v_bfi_b32 v0, v4, v0, v1			; GFX906-NEXT: v_bfi_b32 v0, v4, v0, v1
	; GFX906-NEXT: global_store_dword v[0:1], v0, off			; GFX906-NEXT: global_store_dword v[0:1], v0, off
	; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX906-NEXT: s_setpc_b64 s[30:31]			; GFX906-NEXT: s_setpc_b64 s[30:31]
	;			;
	▲ Show 20 Lines • Show All 1,419 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/local-memory.amdgcn.ll

	Show All 36 Lines
	; GCN-NEXT: .long 32900			; GCN-NEXT: .long 32900

	; GCN-LABEL: {{^}}local_memory_two_objects:			; GCN-LABEL: {{^}}local_memory_two_objects:
	; GCN: v_lshlrev_b32_e32 [[ADDRW:v[0-9]+]], 2, v0			; GCN: v_lshlrev_b32_e32 [[ADDRW:v[0-9]+]], 2, v0
	; CI-DAG: v_sub_i32_e32 [[SUB:v[0-9]+]], vcc, 0, [[ADDRW]]			; CI-DAG: v_sub_i32_e32 [[SUB:v[0-9]+]], vcc, 0, [[ADDRW]]
	; CI-DAG: ds_write2_b32 [[ADDRW]], {{v[0-9]+}}, {{v[0-9]+}} offset1:4			; CI-DAG: ds_write2_b32 [[ADDRW]], {{v[0-9]+}}, {{v[0-9]+}} offset1:4
	; SI-DAG: ds_write2_b32 [[ADDRW]], {{v[0-9]+}}, {{v[0-9]+}} offset1:4			; SI-DAG: ds_write2_b32 [[ADDRW]], {{v[0-9]+}}, {{v[0-9]+}} offset1:4
	; SI-DAG: v_sub_i32_e32 [[SUB0:v[0-9]+]], vcc, 28, [[ADDRW]]			; SI-DAG: v_sub_i32_e32 [[SUB0:v[0-9]+]], vcc, 28, [[ADDRW]]
	; SI-DAG: v_sub_i32_e32 [[SUB1:v[0-9]+]], vcc, 12, [[ADDRW]]

	; GCN: s_barrier			; GCN: s_barrier

				; SI: v_sub_i32_e32 [[SUB1:v[0-9]+]], vcc, 12, [[ADDRW]]
	; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[SUB0]]			; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[SUB0]]
	; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[SUB1]]			; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[SUB1]]
	; CI: ds_read2_b32 {{v\[[0-9]+:[0-9]+\]}}, [[SUB]] offset0:3 offset1:7			; CI: ds_read2_b32 {{v\[[0-9]+:[0-9]+\]}}, [[SUB]] offset0:3 offset1:7

	define amdgpu_kernel void @local_memory_two_objects(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @local_memory_two_objects(i32 addrspace(1)* %out) #0 {
	entry:			entry:
	%x.i = call i32 @llvm.amdgcn.workitem.id.x()			%x.i = call i32 @llvm.amdgcn.workitem.id.x()
	%arrayidx = getelementptr inbounds [4 x i32], [4 x i32] addrspace(3)* @local_memory_two_objects.local_mem0, i32 0, i32 %x.i			%arrayidx = getelementptr inbounds [4 x i32], [4 x i32] addrspace(3)* @local_memory_two_objects.local_mem0, i32 0, i32 %x.i
	Show All 24 Lines

llvm/test/CodeGen/AMDGPU/lshr.v2i16.ll

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	; CI-NEXT: s_endpgm
store <2 x i16> %result, <2 x i16> addrspace(1)* %out.gep		store <2 x i16> %result, <2 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @lshr_v_s_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, <2 x i16> %sgpr) #0 {		define amdgpu_kernel void @lshr_v_s_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, <2 x i16> %sgpr) #0 {
; GFX9-LABEL: lshr_v_s_v2i16:		; GFX9-LABEL: lshr_v_s_v2i16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: s_load_dword s0, s[0:1], 0x34		; GFX9-NEXT: s_load_dword s0, s[0:1], 0x34
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
		foadUnsubmitted Not Done Reply Inline Actions Nice. foad: Nice.
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s7		; GFX9-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s6, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s6, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dword v3, v[0:1], off		; GFX9-NEXT: global_load_dword v3, v[0:1], off
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s4, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s4, v2
; GFX9-NEXT: v_mov_b32_e32 v1, s5		; GFX9-NEXT: v_mov_b32_e32 v1, s5
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_pk_lshrrev_b16 v2, s0, v3		; GFX9-NEXT: v_pk_lshrrev_b16 v2, s0, v3
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: lshr_v_s_v2i16:		; VI-LABEL: lshr_v_s_v2i16:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_load_dword s0, s[0:1], 0x34		; VI-NEXT: s_load_dword s0, s[0:1], 0x34
		; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s7		; VI-NEXT: v_mov_b32_e32 v1, s7
; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dword v3, v[0:1]		; VI-NEXT: flat_load_dword v3, v[0:1]
; VI-NEXT: s_lshr_b32 s1, s0, 16		; VI-NEXT: s_lshr_b32 s1, s0, 16
; VI-NEXT: v_mov_b32_e32 v4, s1		; VI-NEXT: v_mov_b32_e32 v4, s1
; VI-NEXT: v_add_u32_e32 v0, vcc, s4, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s4, v2
Show All 39 Lines	; CI-NEXT: s_endpgm
store <2 x i16> %result, <2 x i16> addrspace(1)* %out.gep		store <2 x i16> %result, <2 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @lshr_s_v_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, <2 x i16> %sgpr) #0 {		define amdgpu_kernel void @lshr_s_v_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, <2 x i16> %sgpr) #0 {
; GFX9-LABEL: lshr_s_v_v2i16:		; GFX9-LABEL: lshr_s_v_v2i16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: s_load_dword s0, s[0:1], 0x34		; GFX9-NEXT: s_load_dword s0, s[0:1], 0x34
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s7		; GFX9-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s6, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s6, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dword v3, v[0:1], off		; GFX9-NEXT: global_load_dword v3, v[0:1], off
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s4, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s4, v2
; GFX9-NEXT: v_mov_b32_e32 v1, s5		; GFX9-NEXT: v_mov_b32_e32 v1, s5
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_pk_lshrrev_b16 v2, v3, s0		; GFX9-NEXT: v_pk_lshrrev_b16 v2, v3, s0
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: lshr_s_v_v2i16:		; VI-LABEL: lshr_s_v_v2i16:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_load_dword s0, s[0:1], 0x34		; VI-NEXT: s_load_dword s0, s[0:1], 0x34
		; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s7		; VI-NEXT: v_mov_b32_e32 v1, s7
; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dword v3, v[0:1]		; VI-NEXT: flat_load_dword v3, v[0:1]
; VI-NEXT: s_lshr_b32 s1, s0, 16		; VI-NEXT: s_lshr_b32 s1, s0, 16
; VI-NEXT: v_mov_b32_e32 v4, s1		; VI-NEXT: v_mov_b32_e32 v4, s1
; VI-NEXT: v_add_u32_e32 v0, vcc, s4, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s4, v2
▲ Show 20 Lines • Show All 338 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-load.ll

Show First 20 Lines • Show All 879 Lines • ▼ Show 20 Lines	entry:
ret void		ret void
}		}

; GCN-LABEL: {{^}}workgroup_acquire:		; GCN-LABEL: {{^}}workgroup_acquire:
; GFX10-NOT: s_waitcnt_v{{[ms]}}cnt {{[^,]+, (0x)*0$}}		; GFX10-NOT: s_waitcnt_v{{[ms]}}cnt {{[^,]+, (0x)*0$}}
; GFX89: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}]{{$}}		; GFX89: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}]{{$}}
; GFX10WGP: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}] glc{{$}}		; GFX10WGP: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}] glc{{$}}
; GFX10CU-NOT: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}] glc{{$}}		; GFX10CU-NOT: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}] glc{{$}}
; GFX89-NOT: s_waitcnt vmcnt(0){{$}}		; GFX89: s_waitcnt lgkmcnt(0){{$}}
		; GFX89: s_waitcnt vmcnt(0){{$}}
; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}		; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
; GFX89-NOT: buffer_wbinvl1_vol		; GFX89-NOT: buffer_wbinvl1_vol
; GFX10WGP-NEXT: buffer_gl0_inv		; GFX10WGP-NEXT: buffer_gl0_inv
; GFX10CU-NOT: buffer_gl0_inv		; GFX10CU-NOT: buffer_gl0_inv
; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RET]]		; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RET]]
; GFX10: .amdhsa_kernel workgroup_acquire		; GFX10: .amdhsa_kernel workgroup_acquire
; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0		; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
; GFX10CU: .amdhsa_workgroup_processor_mode 0		; GFX10CU: .amdhsa_workgroup_processor_mode 0
Show All 10 Lines
; GFX89-NOT: s_waitcnt vmcnt(0){{$}}		; GFX89-NOT: s_waitcnt vmcnt(0){{$}}
; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}		; GFX10WGP: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10WGP-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}		; GFX10CU-NOT: s_waitcnt vmcnt(0){{$}}
; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0		; GFX10CU-NOT: s_waitcnt_vscnt null, 0x0
; GFX89: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}]{{$}}		; GFX89: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}]{{$}}
; GFX10WGP: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}] glc{{$}}		; GFX10WGP: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}] glc{{$}}
; GFX10CU: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}]{{$}}		; GFX10CU: flat_load_dword [[RET:v[0-9]+]], v[{{[0-9]+}}:{{[0-9]+}}]{{$}}
; GFX89-NOT: s_waitcnt vmcnt(0){{$}}		; GFX89: s_waitcnt lgkmcnt(0){{$}}
		; GFX89: s_waitcnt vmcnt(0){{$}}
; GFX10WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}		; GFX10WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
; GFX89-NOT: buffer_wbinvl1_vol		; GFX89-NOT: buffer_wbinvl1_vol
; GFX10WGP-NEXT: buffer_gl0_inv		; GFX10WGP-NEXT: buffer_gl0_inv
; GFX10CU-NOT: buffer_gl0_inv		; GFX10CU-NOT: buffer_gl0_inv
; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RET]]		; GCN: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RET]]
; GFX10: .amdhsa_kernel workgroup_seq_cst		; GFX10: .amdhsa_kernel workgroup_seq_cst
; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0		; GFX10WGP-NOT: .amdhsa_workgroup_processor_mode 0
; GFX10CU: .amdhsa_workgroup_processor_mode 0		; GFX10CU: .amdhsa_workgroup_processor_mode 0
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory_clause.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -march=amdgcn -mcpu=gfx902 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -march=amdgcn -mcpu=gfx902 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefix=GCN %s

define amdgpu_kernel void @vector_clause(<4 x i32> addrspace(1)* noalias nocapture readonly %arg, <4 x i32> addrspace(1)* noalias nocapture %arg1) {		define amdgpu_kernel void @vector_clause(<4 x i32> addrspace(1)* noalias nocapture readonly %arg, <4 x i32> addrspace(1)* noalias nocapture %arg1) {
; GCN-LABEL: vector_clause:		; GCN-LABEL: vector_clause:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24		; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
		; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x2c
		foadUnsubmitted Not Done Reply Inline Actions Nice. foad: Nice.
; GCN-NEXT: v_mov_b32_e32 v17, 0		; GCN-NEXT: v_mov_b32_e32 v17, 0
; GCN-NEXT: v_lshlrev_b32_e32 v16, 4, v0		; GCN-NEXT: v_lshlrev_b32_e32 v16, 4, v0
; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x2c		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
		; GCN-NEXT: s_nop 0
; GCN-NEXT: global_load_dwordx4 v[0:3], v[16:17], s[2:3]		; GCN-NEXT: global_load_dwordx4 v[0:3], v[16:17], s[2:3]
; GCN-NEXT: global_load_dwordx4 v[4:7], v[16:17], s[2:3] offset:16		; GCN-NEXT: global_load_dwordx4 v[4:7], v[16:17], s[2:3] offset:16
; GCN-NEXT: global_load_dwordx4 v[8:11], v[16:17], s[2:3] offset:32		; GCN-NEXT: global_load_dwordx4 v[8:11], v[16:17], s[2:3] offset:32
; GCN-NEXT: global_load_dwordx4 v[12:15], v[16:17], s[2:3] offset:48		; GCN-NEXT: global_load_dwordx4 v[12:15], v[16:17], s[2:3] offset:48
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_waitcnt vmcnt(3)		; GCN-NEXT: s_waitcnt vmcnt(3)
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: global_store_dwordx4 v[16:17], v[0:3], s[4:5]		; GCN-NEXT: global_store_dwordx4 v[16:17], v[0:3], s[4:5]
Show All 39 Lines
; GCN-NEXT: s_load_dwordx4 s[0:3], s[16:17], 0x0		; GCN-NEXT: s_load_dwordx4 s[0:3], s[16:17], 0x0
; GCN-NEXT: s_load_dwordx4 s[4:7], s[16:17], 0x10		; GCN-NEXT: s_load_dwordx4 s[4:7], s[16:17], 0x10
; GCN-NEXT: s_load_dwordx4 s[8:11], s[16:17], 0x20		; GCN-NEXT: s_load_dwordx4 s[8:11], s[16:17], 0x20
; GCN-NEXT: s_load_dwordx4 s[12:15], s[16:17], 0x30		; GCN-NEXT: s_load_dwordx4 s[12:15], s[16:17], 0x30
; GCN-NEXT: v_mov_b32_e32 v12, s18		; GCN-NEXT: v_mov_b32_e32 v12, s18
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, s0		; GCN-NEXT: v_mov_b32_e32 v0, s0
; GCN-NEXT: v_mov_b32_e32 v4, s4		; GCN-NEXT: v_mov_b32_e32 v4, s4
		; GCN-NEXT: v_mov_b32_e32 v8, s8
; GCN-NEXT: v_mov_b32_e32 v13, s19		; GCN-NEXT: v_mov_b32_e32 v13, s19
; GCN-NEXT: v_mov_b32_e32 v1, s1		; GCN-NEXT: v_mov_b32_e32 v1, s1
; GCN-NEXT: v_mov_b32_e32 v2, s2		; GCN-NEXT: v_mov_b32_e32 v2, s2
; GCN-NEXT: v_mov_b32_e32 v3, s3		; GCN-NEXT: v_mov_b32_e32 v3, s3
; GCN-NEXT: v_mov_b32_e32 v5, s5		; GCN-NEXT: v_mov_b32_e32 v5, s5
; GCN-NEXT: v_mov_b32_e32 v6, s6		; GCN-NEXT: v_mov_b32_e32 v6, s6
; GCN-NEXT: v_mov_b32_e32 v7, s7		; GCN-NEXT: v_mov_b32_e32 v7, s7
; GCN-NEXT: v_mov_b32_e32 v8, s8
; GCN-NEXT: v_mov_b32_e32 v9, s9
; GCN-NEXT: v_mov_b32_e32 v10, s10
; GCN-NEXT: v_mov_b32_e32 v11, s11
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: global_store_dwordx4 v[12:13], v[0:3], off		; GCN-NEXT: global_store_dwordx4 v[12:13], v[0:3], off
; GCN-NEXT: global_store_dwordx4 v[12:13], v[4:7], off offset:16		; GCN-NEXT: global_store_dwordx4 v[12:13], v[4:7], off offset:16
; GCN-NEXT: v_mov_b32_e32 v0, s12		; GCN-NEXT: v_mov_b32_e32 v0, s12
		; GCN-NEXT: v_mov_b32_e32 v9, s9
		; GCN-NEXT: v_mov_b32_e32 v10, s10
		; GCN-NEXT: v_mov_b32_e32 v11, s11
; GCN-NEXT: v_mov_b32_e32 v1, s13		; GCN-NEXT: v_mov_b32_e32 v1, s13
; GCN-NEXT: v_mov_b32_e32 v2, s14		; GCN-NEXT: v_mov_b32_e32 v2, s14
; GCN-NEXT: v_mov_b32_e32 v3, s15		; GCN-NEXT: v_mov_b32_e32 v3, s15
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: global_store_dwordx4 v[12:13], v[8:11], off offset:32		; GCN-NEXT: global_store_dwordx4 v[12:13], v[8:11], off offset:32
; GCN-NEXT: global_store_dwordx4 v[12:13], v[0:3], off offset:48		; GCN-NEXT: global_store_dwordx4 v[12:13], v[0:3], off offset:48
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
Show All 17 Lines

define void @mubuf_clause(<4 x i32> addrspace(5)* noalias nocapture readonly %arg, <4 x i32> addrspace(5)* noalias nocapture %arg1) {		define void @mubuf_clause(<4 x i32> addrspace(5)* noalias nocapture readonly %arg, <4 x i32> addrspace(5)* noalias nocapture %arg1) {
; GCN-LABEL: mubuf_clause:		; GCN-LABEL: mubuf_clause:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v2		; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v2
; GCN-NEXT: v_lshlrev_b32_e32 v2, 4, v2		; GCN-NEXT: v_lshlrev_b32_e32 v2, 4, v2
; GCN-NEXT: v_add_u32_e32 v0, v0, v2		; GCN-NEXT: v_add_u32_e32 v0, v0, v2
; GCN-NEXT: v_add_u32_e32 v1, v1, v2
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: buffer_load_dword v3, v0, s[0:3], s33 offen		; GCN-NEXT: buffer_load_dword v3, v0, s[0:3], s33 offen
; GCN-NEXT: buffer_load_dword v4, v0, s[0:3], s33 offen offset:4		; GCN-NEXT: buffer_load_dword v4, v0, s[0:3], s33 offen offset:4
; GCN-NEXT: buffer_load_dword v5, v0, s[0:3], s33 offen offset:8		; GCN-NEXT: buffer_load_dword v5, v0, s[0:3], s33 offen offset:8
; GCN-NEXT: buffer_load_dword v6, v0, s[0:3], s33 offen offset:12		; GCN-NEXT: buffer_load_dword v6, v0, s[0:3], s33 offen offset:12
; GCN-NEXT: buffer_load_dword v7, v0, s[0:3], s33 offen offset:16		; GCN-NEXT: buffer_load_dword v7, v0, s[0:3], s33 offen offset:16
; GCN-NEXT: buffer_load_dword v8, v0, s[0:3], s33 offen offset:20		; GCN-NEXT: buffer_load_dword v8, v0, s[0:3], s33 offen offset:20
; GCN-NEXT: buffer_load_dword v9, v0, s[0:3], s33 offen offset:24		; GCN-NEXT: buffer_load_dword v9, v0, s[0:3], s33 offen offset:24
; GCN-NEXT: buffer_load_dword v10, v0, s[0:3], s33 offen offset:28		; GCN-NEXT: buffer_load_dword v10, v0, s[0:3], s33 offen offset:28
; GCN-NEXT: buffer_load_dword v11, v0, s[0:3], s33 offen offset:32		; GCN-NEXT: buffer_load_dword v11, v0, s[0:3], s33 offen offset:32
; GCN-NEXT: buffer_load_dword v12, v0, s[0:3], s33 offen offset:36		; GCN-NEXT: buffer_load_dword v12, v0, s[0:3], s33 offen offset:36
; GCN-NEXT: buffer_load_dword v13, v0, s[0:3], s33 offen offset:40		; GCN-NEXT: buffer_load_dword v13, v0, s[0:3], s33 offen offset:40
; GCN-NEXT: buffer_load_dword v14, v0, s[0:3], s33 offen offset:44		; GCN-NEXT: buffer_load_dword v14, v0, s[0:3], s33 offen offset:44
; GCN-NEXT: buffer_load_dword v15, v0, s[0:3], s33 offen offset:48		; GCN-NEXT: buffer_load_dword v15, v0, s[0:3], s33 offen offset:48
; GCN-NEXT: buffer_load_dword v16, v0, s[0:3], s33 offen offset:52		; GCN-NEXT: buffer_load_dword v16, v0, s[0:3], s33 offen offset:52
; GCN-NEXT: buffer_load_dword v17, v0, s[0:3], s33 offen offset:56		; GCN-NEXT: buffer_load_dword v17, v0, s[0:3], s33 offen offset:56
		; GCN-NEXT: v_add_u32_e32 v1, v1, v2
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen offset:60		; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], s33 offen offset:60
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_waitcnt vmcnt(15)		; GCN-NEXT: s_waitcnt vmcnt(15)
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: buffer_store_dword v3, v1, s[0:3], s33 offen		; GCN-NEXT: buffer_store_dword v3, v1, s[0:3], s33 offen
; GCN-NEXT: s_waitcnt vmcnt(15)		; GCN-NEXT: s_waitcnt vmcnt(15)
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	bb:
store <4 x i32> %tmp15, <4 x i32> addrspace(5)* %tmp16, align 16		store <4 x i32> %tmp15, <4 x i32> addrspace(5)* %tmp16, align 16
ret void		ret void
}		}

define amdgpu_kernel void @vector_clause_indirect(i64 addrspace(1)* noalias nocapture readonly %arg, <4 x i32> addrspace(1)* noalias nocapture readnone %arg1, <4 x i32> addrspace(1)* noalias nocapture %arg2) {		define amdgpu_kernel void @vector_clause_indirect(i64 addrspace(1)* noalias nocapture readonly %arg, <4 x i32> addrspace(1)* noalias nocapture readnone %arg1, <4 x i32> addrspace(1)* noalias nocapture %arg2) {
; GCN-LABEL: vector_clause_indirect:		; GCN-LABEL: vector_clause_indirect:
; GCN: ; %bb.0: ; %bb		; GCN: ; %bb.0: ; %bb
; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24		; GCN-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x24
		; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34
; GCN-NEXT: v_mov_b32_e32 v1, 0		; GCN-NEXT: v_mov_b32_e32 v1, 0
; GCN-NEXT: v_lshlrev_b32_e32 v0, 3, v0		; GCN-NEXT: v_lshlrev_b32_e32 v0, 3, v0
; GCN-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x34		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
		; GCN-NEXT: s_nop 0
; GCN-NEXT: global_load_dwordx2 v[8:9], v[0:1], s[2:3]		; GCN-NEXT: global_load_dwordx2 v[8:9], v[0:1], s[2:3]
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_nop 0		; GCN-NEXT: s_nop 0
; GCN-NEXT: global_load_dwordx4 v[0:3], v[8:9], off		; GCN-NEXT: global_load_dwordx4 v[0:3], v[8:9], off
; GCN-NEXT: global_load_dwordx4 v[4:7], v[8:9], off offset:16		; GCN-NEXT: global_load_dwordx4 v[4:7], v[8:9], off offset:16
; GCN-NEXT: v_mov_b32_e32 v9, s5		; GCN-NEXT: v_mov_b32_e32 v9, s5
; GCN-NEXT: v_mov_b32_e32 v8, s4		; GCN-NEXT: v_mov_b32_e32 v8, s4
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/merge-store-crash.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck %s

	; This is used to crash in LiveIntervalAnalysis via SILoadStoreOptimizer			; This is used to crash in LiveIntervalAnalysis via SILoadStoreOptimizer
	; while fixing up the merge of two ds_write instructions.			; while fixing up the merge of two ds_write instructions.

	@tess_lds = external addrspace(3) global [8192 x i32]			@tess_lds = external addrspace(3) global [8192 x i32]

	; CHECK-LABEL: {{^}}main:			; CHECK-LABEL: {{^}}main:
	; CHECK: ds_write_b32			; CHECK-DAG: ds_write_b32
	; CHECK: ds_write_b32			; CHECK-DAG: ds_write_b32
	; CHECK: v_mov_b32_e32 v1, v0			; CHECK-DAG: v_mov_b32_e32 v1, v0
	; CHECK: tbuffer_store_format_xyzw v[0:3],			; CHECK: tbuffer_store_format_xyzw v[0:3],
	define amdgpu_vs void @main(i32 inreg %arg) {			define amdgpu_vs void @main(i32 inreg %arg) {
	main_body:			main_body:
	%tmp = load float, float addrspace(3)* undef, align 4			%tmp = load float, float addrspace(3)* undef, align 4
	%tmp1 = load float, float addrspace(3)* undef, align 4			%tmp1 = load float, float addrspace(3)* undef, align 4
	store float %tmp, float addrspace(3)* null, align 4			store float %tmp, float addrspace(3)* null, align 4
	%tmp2 = bitcast float %tmp to i32			%tmp2 = bitcast float %tmp to i32
	%tmp3 = add nuw nsw i32 0, 1			%tmp3 = add nuw nsw i32 0, 1
	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/postra-bundle-memops.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -run-pass=si-post-ra-bundler %s -o - \| FileCheck -check-prefix=GCN %s

				---
				name: bundle_memops
				tracksRegLiveness: true
				body: \|
				bb.0:
				; GCN-LABEL: name: bundle_memops
				; GCN: $vgpr0 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				; GCN: S_NOP 0
				; GCN: BUNDLE implicit-def $vgpr0, implicit-def $vgpr1, implicit undef $vgpr3_vgpr4, implicit $exec {
				; GCN: $vgpr0 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr1 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 4, 0, 0, 0, implicit $exec
				; GCN: }
				; GCN: S_NOP 0
				; GCN: $vgpr0 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				; GCN: BUNDLE implicit-def $vgpr1, implicit-def $vgpr2, implicit-def $vgpr5, implicit undef $vgpr0_vgpr1, implicit $exec, implicit undef $vgpr3_vgpr4 {
				; GCN: $vgpr1 = GLOBAL_LOAD_DWORD undef $vgpr0_vgpr1, 4, 0, 0, 0, implicit $exec
				; GCN: $vgpr2 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 4, 0, 0, 0, implicit $exec
				; GCN: $vgpr5 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				; GCN: }
				; GCN: BUNDLE implicit undef $vgpr3_vgpr4, implicit $vgpr1, implicit $exec, implicit $vgpr0 {
				; GCN: GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr1, 0, 0, 0, 0, implicit $exec
				; GCN: GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr0, 4, 0, 0, 0, implicit $exec
				; GCN: }
				; GCN: S_NOP 0
				; GCN: BUNDLE implicit undef $vgpr3_vgpr4, implicit $vgpr1, implicit $exec, implicit $vgpr0 {
				; GCN: GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr1, 0, 0, 0, 0, implicit $exec
				; GCN: GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr0, 4, 0, 0, 0, implicit $exec
				; GCN: GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr1, 0, 0, 0, 0, implicit $exec
				; GCN: }
				; GCN: S_NOP 0
				; GCN: $vgpr0 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				; GCN: S_NOP 0
				; GCN: GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr0, 4, 0, 0, 0, implicit $exec
				; GCN: BUNDLE implicit-def $vgpr2, implicit-def $vgpr3, implicit $vgpr0, implicit $exec, implicit $vgpr1 {
				; GCN: $vgpr2 = DS_READ_B32_gfx9 $vgpr0, 0, 0, implicit $exec
				; GCN: $vgpr3 = DS_READ_B32_gfx9 $vgpr1, 0, 0, implicit $exec
				; GCN: }
				; GCN: BUNDLE implicit $vgpr0, implicit $vgpr2, implicit killed $m0, implicit $exec, implicit $vgpr3 {
				; GCN: DS_WRITE_B32_gfx9 $vgpr0, $vgpr2, 0, 0, implicit killed $m0, implicit $exec
				; GCN: DS_WRITE_B32_gfx9 $vgpr0, $vgpr3, 4, 0, implicit killed $m0, implicit $exec
				; GCN: }
				; GCN: S_NOP 0
				; GCN: BUNDLE implicit-def $sgpr2, implicit-def $sgpr3, implicit undef $sgpr0_sgpr1, implicit undef $sgpr10 {
				; GCN: $sgpr2 = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0
				; GCN: $sgpr3 = S_LOAD_DWORD_SGPR undef $sgpr0_sgpr1, undef $sgpr10, 0, 0
				; GCN: }
				; GCN: BUNDLE implicit-def $vgpr2, implicit-def $vgpr3, implicit $vgpr0, implicit undef $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr2, implicit $exec, implicit $vgpr1 {
				; GCN: $vgpr2 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, undef $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr2, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr3 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, undef $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr2, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: }
				; GCN: BUNDLE implicit $vgpr0, implicit $vgpr2_vgpr3, implicit undef $sgpr0_sgpr1_sgpr2_sgpr3, implicit $exec {
				; GCN: BUFFER_STORE_DWORD_ADDR64 $vgpr0, $vgpr2_vgpr3, undef $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: BUFFER_STORE_DWORD_ADDR64 $vgpr0, $vgpr2_vgpr3, undef $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: }
				; GCN: BUNDLE implicit-def $vgpr2, implicit-def $vgpr3, implicit undef $vgpr4_vgpr5_vgpr6_vgpr7, implicit undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, implicit $exec {
				; GCN: $vgpr2 = IMAGE_LOAD_V1_V4 undef $vgpr4_vgpr5_vgpr6_vgpr7, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 2, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: $vgpr3 = IMAGE_LOAD_V1_V4 undef $vgpr4_vgpr5_vgpr6_vgpr7, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 2, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: }
				; GCN: BUNDLE implicit undef $vgpr0_vgpr1_vgpr2_vgpr3, implicit $vgpr0_vgpr1, implicit undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, implicit $exec {
				; GCN: IMAGE_STORE_V4_V2 undef $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr0_vgpr1, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 15, -1, 1, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: IMAGE_STORE_V4_V2 undef $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr0_vgpr1, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 15, -1, 1, 0, 0, 0, 0, 0, 0, implicit $exec
				; GCN: }
				; GCN: S_NOP 0
				; GCN: BUNDLE implicit-def $vgpr2, implicit-def $vgpr3, implicit $vgpr0, implicit $exec, implicit $vgpr1 {
				; GCN: $vgpr2 = DS_READ_B32_gfx9 $vgpr0, 0, 0, implicit $exec
				; GCN: $vgpr3 = DS_READ_B32_gfx9 $vgpr1, 0, 0, implicit $exec
				; GCN: }
				$vgpr0 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				S_NOP 0
				$vgpr0 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				$vgpr1 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 4, 0, 0, 0, implicit $exec
				S_NOP 0
				$vgpr0 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				$vgpr1 = GLOBAL_LOAD_DWORD undef $vgpr0_vgpr1, 4, 0, 0, 0, implicit $exec
				$vgpr2 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 4, 0, 0, 0, implicit $exec
				$vgpr5 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr1, 0, 0, 0, 0, implicit $exec
				GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr0, 4, 0, 0, 0, implicit $exec
				S_NOP 0
				GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr1, 0, 0, 0, 0, implicit $exec
				GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr0, 4, 0, 0, 0, implicit $exec
				GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr1, 0, 0, 0, 0, implicit $exec
				S_NOP 0
				$vgpr0 = GLOBAL_LOAD_DWORD undef $vgpr3_vgpr4, 0, 0, 0, 0, implicit $exec
				S_NOP 0
				GLOBAL_STORE_DWORD undef $vgpr3_vgpr4, $vgpr0, 4, 0, 0, 0, implicit $exec
				$vgpr2 = DS_READ_B32_gfx9 $vgpr0, 0, 0, implicit $exec
				$vgpr3 = DS_READ_B32_gfx9 $vgpr1, 0, 0, implicit $exec
				DS_WRITE_B32_gfx9 $vgpr0, $vgpr2, 0, 0, implicit killed $m0, implicit $exec
				DS_WRITE_B32_gfx9 $vgpr0, $vgpr3, 4, 0, implicit killed $m0, implicit $exec
				S_NOP 0
				$sgpr2 = S_LOAD_DWORD_IMM undef $sgpr0_sgpr1, 0, 0, 0
				$sgpr3 = S_LOAD_DWORD_SGPR undef $sgpr0_sgpr1, undef $sgpr10, 0, 0
				$vgpr2 = BUFFER_LOAD_DWORD_OFFEN $vgpr0, undef $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr2, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr3 = BUFFER_LOAD_DWORD_OFFEN $vgpr1, undef $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr2, 0, 0, 0, 0, 0, 0, implicit $exec
				BUFFER_STORE_DWORD_ADDR64 $vgpr0, $vgpr2_vgpr3, undef $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				BUFFER_STORE_DWORD_ADDR64 $vgpr0, $vgpr2_vgpr3, undef $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr2 = IMAGE_LOAD_V1_V4 undef $vgpr4_vgpr5_vgpr6_vgpr7, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 2, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				$vgpr3 = IMAGE_LOAD_V1_V4 undef $vgpr4_vgpr5_vgpr6_vgpr7, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 2, -1, 0, 0, 0, 0, 0, 0, 0, implicit $exec
				IMAGE_STORE_V4_V2 undef $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr0_vgpr1, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 15, -1, 1, 0, 0, 0, 0, 0, 0, implicit $exec
				IMAGE_STORE_V4_V2 undef $vgpr0_vgpr1_vgpr2_vgpr3, $vgpr0_vgpr1, undef $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, 15, -1, 1, 0, 0, 0, 0, 0, 0, implicit $exec
				S_NOP 0
				$vgpr2 = DS_READ_B32_gfx9 $vgpr0, 0, 0, implicit $exec
				$vgpr3 = DS_READ_B32_gfx9 $vgpr1, 0, 0, implicit $exec
				...

llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	;			;
				; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-4096
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off
	;			;
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:-2048
	▲ Show 20 Lines • Show All 343 Lines • ▼ Show 20 Lines
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]			; GFX8: flat_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
	;			;
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
				; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
				; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048			; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	; GFX9: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off offset:2048
	;			;
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}			; GFX10: global_load_dwordx2 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], off{{$}}
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/saddo.ll

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	; SI-NEXT: buffer_store_byte v0, off, s[12:15], 0			; SI-NEXT: buffer_store_byte v0, off, s[12:15], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	; VI-LABEL: v_saddo_i32:			; VI-LABEL: v_saddo_i32:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: v_mov_b32_e32 v4, s6			; VI-NEXT: v_mov_b32_e32 v4, s6
	; VI-NEXT: v_mov_b32_e32 v5, s7
	; VI-NEXT: v_mov_b32_e32 v6, s4			; VI-NEXT: v_mov_b32_e32 v6, s4
	; VI-NEXT: v_mov_b32_e32 v7, s5			; VI-NEXT: v_mov_b32_e32 v7, s5
				; VI-NEXT: v_mov_b32_e32 v5, s7
	; VI-NEXT: flat_load_dword v6, v[6:7]			; VI-NEXT: flat_load_dword v6, v[6:7]
	; VI-NEXT: flat_load_dword v4, v[4:5]			; VI-NEXT: flat_load_dword v4, v[4:5]
	; VI-NEXT: v_mov_b32_e32 v2, s0			; VI-NEXT: v_mov_b32_e32 v2, s0
	; VI-NEXT: v_mov_b32_e32 v3, s1			; VI-NEXT: v_mov_b32_e32 v3, s1
	; VI-NEXT: v_mov_b32_e32 v0, s2			; VI-NEXT: v_mov_b32_e32 v0, s2
	; VI-NEXT: v_mov_b32_e32 v1, s3			; VI-NEXT: v_mov_b32_e32 v1, s3
	; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; VI-NEXT: v_add_u32_e32 v5, vcc, v4, v6			; VI-NEXT: v_add_u32_e32 v5, vcc, v4, v6
	; VI-NEXT: v_cmp_gt_i32_e32 vcc, 0, v4			; VI-NEXT: v_cmp_gt_i32_e32 vcc, 0, v4
	; VI-NEXT: v_cmp_lt_i32_e64 s[0:1], v5, v6			; VI-NEXT: v_cmp_lt_i32_e64 s[0:1], v5, v6
	; VI-NEXT: s_xor_b64 s[0:1], vcc, s[0:1]			; VI-NEXT: s_xor_b64 s[0:1], vcc, s[0:1]
	; VI-NEXT: flat_store_dword v[2:3], v5			; VI-NEXT: flat_store_dword v[2:3], v5
	; VI-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[0:1]			; VI-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[0:1]
	; VI-NEXT: flat_store_byte v[0:1], v2			; VI-NEXT: flat_store_byte v[0:1], v2
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	;			;
	; GFX9-LABEL: v_saddo_i32:			; GFX9-LABEL: v_saddo_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; GFX9-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v4, s6			; GFX9-NEXT: v_mov_b32_e32 v4, s6
	; GFX9-NEXT: v_mov_b32_e32 v5, s7
	; GFX9-NEXT: v_mov_b32_e32 v6, s4			; GFX9-NEXT: v_mov_b32_e32 v6, s4
	; GFX9-NEXT: v_mov_b32_e32 v7, s5			; GFX9-NEXT: v_mov_b32_e32 v7, s5
				; GFX9-NEXT: v_mov_b32_e32 v5, s7
	; GFX9-NEXT: global_load_dword v6, v[6:7], off			; GFX9-NEXT: global_load_dword v6, v[6:7], off
	; GFX9-NEXT: global_load_dword v4, v[4:5], off			; GFX9-NEXT: global_load_dword v4, v[4:5], off
	; GFX9-NEXT: v_mov_b32_e32 v2, s0			; GFX9-NEXT: v_mov_b32_e32 v2, s0
	; GFX9-NEXT: v_mov_b32_e32 v3, s1			; GFX9-NEXT: v_mov_b32_e32 v3, s1
	; GFX9-NEXT: v_mov_b32_e32 v0, s2			; GFX9-NEXT: v_mov_b32_e32 v0, s2
	; GFX9-NEXT: v_mov_b32_e32 v1, s3			; GFX9-NEXT: v_mov_b32_e32 v1, s3
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_add_u32_e32 v5, v6, v4			; GFX9-NEXT: v_add_u32_e32 v5, v6, v4
	▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/salu-to-valu.ll

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	entry:
store <4 x i32> %tmp5, <4 x i32> addrspace(1)* %out		store <4 x i32> %tmp5, <4 x i32> addrspace(1)* %out
ret void		ret void
}		}

; Original scalar load uses SGPR offset on SI and 32-bit literal on		; Original scalar load uses SGPR offset on SI and 32-bit literal on
; CI.		; CI.

; GCN-LABEL: {{^}}smrd_valu_ci_offset_x8:		; GCN-LABEL: {{^}}smrd_valu_ci_offset_x8:
; GCN-NOHSA: s_mov_b32 [[OFFSET0:s[0-9]+]], 0x9a40{{$}}		; GCN-NOHSA-DAG: s_mov_b32 [[OFFSET0:s[0-9]+]], 0x9a40{{$}}
; GCN-NOHSA-NOT: v_add		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET1:s[0-9]+]], 0x9a50{{$}}
; CI-NOHSA: s_mov_b32 [[OFFSET1:s[0-9]+]], 0x9a50{{$}}
; CI-NOHSA-NOT: v_add		; CI-NOHSA-NOT: v_add
; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:16		; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:16
; CI-NOHSA: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET1]] addr64{{$}}		; CI-NOHSA: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET1]] addr64{{$}}
; GCN-NOHSA: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET0]] addr64{{$}}		; GCN-NOHSA: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET0]] addr64{{$}}

; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
▲ Show 20 Lines • Show All 331 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/scratch-simple.ll

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:			; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0

	; SIVI-NOT: s_mov_b32 s6			; SIVI-NOT: s_mov_b32 s6
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen

	; GFX9_10-NOT: s_mov_b32 s5			; GFX9_10-NOT: s_mov_b32 s5
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen

	; GCN: s_mov_b32 s2, s5			; GCN-DAG: s_mov_b32 s2, s5
	define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {			define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%f = fadd float %v1, %v2			%f = fadd float %v1, %v2
	%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2			%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
	%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3			%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
	ret <{i32, i32, i32, float}> %r2			ret <{i32, i32, i32, float}> %r2
	}			}

	; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:			; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0

	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s6 offen

	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen			; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, s5 offen

	; GCN: s_mov_b32 s2, s5			; GCN-DAG: s_mov_b32 s2, s5
	define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {			define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%f = fadd float %v1, %v2			%f = fadd float %v1, %v2
	%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2			%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
	%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3			%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
	ret <{i32, i32, i32, float}> %r2			ret <{i32, i32, i32, float}> %r2
	}			}

llvm/test/CodeGen/AMDGPU/select.f16.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -amdgpu-scalarize-global-loads=false -mtriple=amdgcn-- -mcpu=tahiti -verify-machineinstrs \| FileCheck %s -check-prefixes=GCN,SI			; RUN: llc < %s -amdgpu-scalarize-global-loads=false -mtriple=amdgcn-- -mcpu=tahiti -verify-machineinstrs \| FileCheck %s -check-prefixes=GCN,SI
	; RUN: llc < %s -amdgpu-scalarize-global-loads=false -mtriple=amdgcn-- -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs \| FileCheck %s -check-prefixes=GCN,VI			; RUN: llc < %s -amdgpu-scalarize-global-loads=false -mtriple=amdgcn-- -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs \| FileCheck %s -check-prefixes=GCN,VI

	define amdgpu_kernel void @select_f16(			define amdgpu_kernel void @select_f16(
	; SI-LABEL: select_f16:			; SI-LABEL: select_f16:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x9			; SI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x11			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x11
	; SI-NEXT: s_mov_b32 s15, 0xf000			; SI-NEXT: s_mov_b32 s15, 0xf000
	; SI-NEXT: s_mov_b32 s14, -1			; SI-NEXT: s_mov_b32 s14, -1
	; SI-NEXT: s_mov_b32 s22, s14			; SI-NEXT: s_mov_b32 s22, s14
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s16, s10
	; SI-NEXT: s_mov_b32 s17, s11
	; SI-NEXT: s_mov_b32 s10, s14
	; SI-NEXT: s_mov_b32 s11, s15
	; SI-NEXT: s_mov_b32 s20, s6			; SI-NEXT: s_mov_b32 s20, s6
	; SI-NEXT: s_mov_b32 s21, s7			; SI-NEXT: s_mov_b32 s21, s7
	; SI-NEXT: s_mov_b32 s23, s15			; SI-NEXT: s_mov_b32 s23, s15
				; SI-NEXT: s_mov_b32 s16, s10
				; SI-NEXT: s_mov_b32 s17, s11
	; SI-NEXT: s_mov_b32 s2, s14			; SI-NEXT: s_mov_b32 s2, s14
	; SI-NEXT: s_mov_b32 s3, s15			; SI-NEXT: s_mov_b32 s3, s15
	; SI-NEXT: s_mov_b32 s18, s14			; SI-NEXT: s_mov_b32 s18, s14
	; SI-NEXT: s_mov_b32 s19, s15			; SI-NEXT: s_mov_b32 s19, s15
				; SI-NEXT: s_mov_b32 s10, s14
				; SI-NEXT: s_mov_b32 s11, s15
	; SI-NEXT: buffer_load_ushort v0, off, s[20:23], 0			; SI-NEXT: buffer_load_ushort v0, off, s[20:23], 0
	; SI-NEXT: buffer_load_ushort v1, off, s[8:11], 0			; SI-NEXT: buffer_load_ushort v1, off, s[8:11], 0
	; SI-NEXT: buffer_load_ushort v2, off, s[16:19], 0			; SI-NEXT: buffer_load_ushort v2, off, s[16:19], 0
	; SI-NEXT: buffer_load_ushort v3, off, s[0:3], 0			; SI-NEXT: buffer_load_ushort v3, off, s[0:3], 0
	; SI-NEXT: s_mov_b32 s12, s4			; SI-NEXT: s_mov_b32 s12, s4
	; SI-NEXT: s_mov_b32 s13, s5			; SI-NEXT: s_mov_b32 s13, s5
	; SI-NEXT: s_waitcnt vmcnt(3)			; SI-NEXT: s_waitcnt vmcnt(3)
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	Show All 14 Lines
	; VI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x44			; VI-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x44
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_mov_b32 s14, s2			; VI-NEXT: s_mov_b32 s14, s2
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s0, s4			; VI-NEXT: s_mov_b32 s0, s4
	; VI-NEXT: s_mov_b32 s1, s5			; VI-NEXT: s_mov_b32 s1, s5
	; VI-NEXT: s_mov_b32 s16, s10
	; VI-NEXT: s_mov_b32 s17, s11
	; VI-NEXT: s_mov_b32 s4, s6			; VI-NEXT: s_mov_b32 s4, s6
	; VI-NEXT: s_mov_b32 s5, s7			; VI-NEXT: s_mov_b32 s5, s7
	; VI-NEXT: s_mov_b32 s10, s2
	; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
				; VI-NEXT: s_mov_b32 s16, s10
				; VI-NEXT: s_mov_b32 s17, s11
	; VI-NEXT: s_mov_b32 s15, s3			; VI-NEXT: s_mov_b32 s15, s3
	; VI-NEXT: s_mov_b32 s18, s2			; VI-NEXT: s_mov_b32 s18, s2
	; VI-NEXT: s_mov_b32 s19, s3			; VI-NEXT: s_mov_b32 s19, s3
				; VI-NEXT: s_mov_b32 s10, s2
				; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; VI-NEXT: buffer_load_ushort v0, off, s[4:7], 0
	; VI-NEXT: buffer_load_ushort v1, off, s[8:11], 0			; VI-NEXT: buffer_load_ushort v1, off, s[8:11], 0
	; VI-NEXT: buffer_load_ushort v2, off, s[16:19], 0			; VI-NEXT: buffer_load_ushort v2, off, s[16:19], 0
	; VI-NEXT: buffer_load_ushort v3, off, s[12:15], 0			; VI-NEXT: buffer_load_ushort v3, off, s[12:15], 0
	; VI-NEXT: s_waitcnt vmcnt(2)			; VI-NEXT: s_waitcnt vmcnt(2)
	; VI-NEXT: v_cmp_lt_f16_e32 vcc, v0, v1			; VI-NEXT: v_cmp_lt_f16_e32 vcc, v0, v1
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_cndmask_b32_e32 v0, v3, v2, vcc			; VI-NEXT: v_cndmask_b32_e32 v0, v3, v2, vcc
	▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
	; SI-LABEL: select_f16_imm_c:			; SI-LABEL: select_f16_imm_c:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s11, 0xf000			; SI-NEXT: s_mov_b32 s11, 0xf000
	; SI-NEXT: s_mov_b32 s10, -1			; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_mov_b32 s18, s10			; SI-NEXT: s_mov_b32 s18, s10
	; SI-NEXT: s_mov_b32 s19, s11			; SI-NEXT: s_mov_b32 s19, s11
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s6, s10
	; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: s_mov_b32 s16, s2			; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s17, s3			; SI-NEXT: s_mov_b32 s17, s3
				; SI-NEXT: s_mov_b32 s12, s6
				; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s14, s10			; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s15, s11			; SI-NEXT: s_mov_b32 s15, s11
				; SI-NEXT: s_mov_b32 s6, s10
				; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: buffer_load_ushort v0, off, s[16:19], 0			; SI-NEXT: buffer_load_ushort v0, off, s[16:19], 0
	; SI-NEXT: buffer_load_ushort v1, off, s[4:7], 0			; SI-NEXT: buffer_load_ushort v1, off, s[4:7], 0
	; SI-NEXT: buffer_load_ushort v2, off, s[12:15], 0			; SI-NEXT: buffer_load_ushort v2, off, s[12:15], 0
	; SI-NEXT: s_mov_b32 s8, s0			; SI-NEXT: s_mov_b32 s8, s0
	; SI-NEXT: s_mov_b32 s9, s1			; SI-NEXT: s_mov_b32 s9, s1
	; SI-NEXT: s_waitcnt vmcnt(2)			; SI-NEXT: s_waitcnt vmcnt(2)
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: s_waitcnt vmcnt(1)			; SI-NEXT: s_waitcnt vmcnt(1)
	Show All 11 Lines
	; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; VI-NEXT: s_mov_b32 s11, 0xf000			; VI-NEXT: s_mov_b32 s11, 0xf000
	; VI-NEXT: s_mov_b32 s10, -1			; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_mov_b32 s14, s10			; VI-NEXT: s_mov_b32 s14, s10
	; VI-NEXT: s_mov_b32 s15, s11			; VI-NEXT: s_mov_b32 s15, s11
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s8, s0			; VI-NEXT: s_mov_b32 s8, s0
	; VI-NEXT: s_mov_b32 s9, s1			; VI-NEXT: s_mov_b32 s9, s1
	; VI-NEXT: s_mov_b32 s12, s6
	; VI-NEXT: s_mov_b32 s13, s7
	; VI-NEXT: s_mov_b32 s0, s2			; VI-NEXT: s_mov_b32 s0, s2
	; VI-NEXT: s_mov_b32 s1, s3			; VI-NEXT: s_mov_b32 s1, s3
	; VI-NEXT: s_mov_b32 s6, s10
	; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: s_mov_b32 s2, s10			; VI-NEXT: s_mov_b32 s2, s10
	; VI-NEXT: s_mov_b32 s3, s11			; VI-NEXT: s_mov_b32 s3, s11
				; VI-NEXT: s_mov_b32 s12, s6
				; VI-NEXT: s_mov_b32 s13, s7
				; VI-NEXT: s_mov_b32 s6, s10
				; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: buffer_load_ushort v0, off, s[0:3], 0			; VI-NEXT: buffer_load_ushort v0, off, s[0:3], 0
	; VI-NEXT: buffer_load_ushort v1, off, s[4:7], 0			; VI-NEXT: buffer_load_ushort v1, off, s[4:7], 0
	; VI-NEXT: buffer_load_ushort v3, off, s[12:15], 0			; VI-NEXT: buffer_load_ushort v3, off, s[12:15], 0
	; VI-NEXT: v_mov_b32_e32 v2, 0x3800			; VI-NEXT: v_mov_b32_e32 v2, 0x3800
	; VI-NEXT: s_waitcnt vmcnt(1)			; VI-NEXT: s_waitcnt vmcnt(1)
	; VI-NEXT: v_cmp_nlt_f16_e32 vcc, v0, v1			; VI-NEXT: v_cmp_nlt_f16_e32 vcc, v0, v1
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc			; VI-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
	Show All 17 Lines
	; SI-LABEL: select_f16_imm_d:			; SI-LABEL: select_f16_imm_d:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s11, 0xf000			; SI-NEXT: s_mov_b32 s11, 0xf000
	; SI-NEXT: s_mov_b32 s10, -1			; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_mov_b32 s18, s10			; SI-NEXT: s_mov_b32 s18, s10
	; SI-NEXT: s_mov_b32 s19, s11			; SI-NEXT: s_mov_b32 s19, s11
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s6, s10
	; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: s_mov_b32 s16, s2			; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s17, s3			; SI-NEXT: s_mov_b32 s17, s3
				; SI-NEXT: s_mov_b32 s12, s6
				; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s14, s10			; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s15, s11			; SI-NEXT: s_mov_b32 s15, s11
				; SI-NEXT: s_mov_b32 s6, s10
				; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: buffer_load_ushort v0, off, s[16:19], 0			; SI-NEXT: buffer_load_ushort v0, off, s[16:19], 0
	; SI-NEXT: buffer_load_ushort v1, off, s[4:7], 0			; SI-NEXT: buffer_load_ushort v1, off, s[4:7], 0
	; SI-NEXT: buffer_load_ushort v2, off, s[12:15], 0			; SI-NEXT: buffer_load_ushort v2, off, s[12:15], 0
	; SI-NEXT: s_mov_b32 s8, s0			; SI-NEXT: s_mov_b32 s8, s0
	; SI-NEXT: s_mov_b32 s9, s1			; SI-NEXT: s_mov_b32 s9, s1
	; SI-NEXT: s_waitcnt vmcnt(2)			; SI-NEXT: s_waitcnt vmcnt(2)
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: s_waitcnt vmcnt(1)			; SI-NEXT: s_waitcnt vmcnt(1)
	Show All 11 Lines
	; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; VI-NEXT: s_mov_b32 s11, 0xf000			; VI-NEXT: s_mov_b32 s11, 0xf000
	; VI-NEXT: s_mov_b32 s10, -1			; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_mov_b32 s14, s10			; VI-NEXT: s_mov_b32 s14, s10
	; VI-NEXT: s_mov_b32 s15, s11			; VI-NEXT: s_mov_b32 s15, s11
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s8, s0			; VI-NEXT: s_mov_b32 s8, s0
	; VI-NEXT: s_mov_b32 s9, s1			; VI-NEXT: s_mov_b32 s9, s1
	; VI-NEXT: s_mov_b32 s12, s6
	; VI-NEXT: s_mov_b32 s13, s7
	; VI-NEXT: s_mov_b32 s0, s2			; VI-NEXT: s_mov_b32 s0, s2
	; VI-NEXT: s_mov_b32 s1, s3			; VI-NEXT: s_mov_b32 s1, s3
	; VI-NEXT: s_mov_b32 s6, s10
	; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: s_mov_b32 s2, s10			; VI-NEXT: s_mov_b32 s2, s10
	; VI-NEXT: s_mov_b32 s3, s11			; VI-NEXT: s_mov_b32 s3, s11
				; VI-NEXT: s_mov_b32 s12, s6
				; VI-NEXT: s_mov_b32 s13, s7
				; VI-NEXT: s_mov_b32 s6, s10
				; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: buffer_load_ushort v0, off, s[0:3], 0			; VI-NEXT: buffer_load_ushort v0, off, s[0:3], 0
	; VI-NEXT: buffer_load_ushort v1, off, s[4:7], 0			; VI-NEXT: buffer_load_ushort v1, off, s[4:7], 0
	; VI-NEXT: buffer_load_ushort v3, off, s[12:15], 0			; VI-NEXT: buffer_load_ushort v3, off, s[12:15], 0
	; VI-NEXT: v_mov_b32_e32 v2, 0x3800			; VI-NEXT: v_mov_b32_e32 v2, 0x3800
	; VI-NEXT: s_waitcnt vmcnt(1)			; VI-NEXT: s_waitcnt vmcnt(1)
	; VI-NEXT: v_cmp_lt_f16_e32 vcc, v0, v1			; VI-NEXT: v_cmp_lt_f16_e32 vcc, v0, v1
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc			; VI-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc
	Show All 17 Lines
	; SI-LABEL: select_v2f16:			; SI-LABEL: select_v2f16:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x9			; SI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x11			; SI-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x11
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s3, 0xf000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: s_mov_b32 s22, s2			; SI-NEXT: s_mov_b32 s22, s2
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s16, s10
	; SI-NEXT: s_mov_b32 s17, s11
	; SI-NEXT: s_mov_b32 s10, s2
	; SI-NEXT: s_mov_b32 s11, s3
	; SI-NEXT: s_mov_b32 s20, s6			; SI-NEXT: s_mov_b32 s20, s6
	; SI-NEXT: s_mov_b32 s21, s7			; SI-NEXT: s_mov_b32 s21, s7
	; SI-NEXT: s_mov_b32 s23, s3			; SI-NEXT: s_mov_b32 s23, s3
				; SI-NEXT: s_mov_b32 s16, s10
				; SI-NEXT: s_mov_b32 s17, s11
	; SI-NEXT: s_mov_b32 s14, s2			; SI-NEXT: s_mov_b32 s14, s2
	; SI-NEXT: s_mov_b32 s15, s3			; SI-NEXT: s_mov_b32 s15, s3
	; SI-NEXT: buffer_load_dword v0, off, s[20:23], 0
	; SI-NEXT: s_mov_b32 s18, s2			; SI-NEXT: s_mov_b32 s18, s2
	; SI-NEXT: s_mov_b32 s19, s3			; SI-NEXT: s_mov_b32 s19, s3
				; SI-NEXT: s_mov_b32 s10, s2
				; SI-NEXT: s_mov_b32 s11, s3
				; SI-NEXT: buffer_load_dword v0, off, s[20:23], 0
	; SI-NEXT: buffer_load_dword v1, off, s[8:11], 0			; SI-NEXT: buffer_load_dword v1, off, s[8:11], 0
	; SI-NEXT: buffer_load_dword v2, off, s[12:15], 0			; SI-NEXT: buffer_load_dword v2, off, s[12:15], 0
	; SI-NEXT: buffer_load_dword v3, off, s[16:19], 0			; SI-NEXT: buffer_load_dword v3, off, s[16:19], 0
	; SI-NEXT: s_mov_b32 s0, s4			; SI-NEXT: s_mov_b32 s0, s4
	; SI-NEXT: s_mov_b32 s1, s5			; SI-NEXT: s_mov_b32 s1, s5
	; SI-NEXT: s_waitcnt vmcnt(3)			; SI-NEXT: s_waitcnt vmcnt(3)
	; SI-NEXT: v_lshrrev_b32_e32 v5, 16, v0			; SI-NEXT: v_lshrrev_b32_e32 v5, 16, v0
	; SI-NEXT: v_cvt_f32_f16_e32 v5, v5
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: s_waitcnt vmcnt(2)			; SI-NEXT: s_waitcnt vmcnt(2)
	; SI-NEXT: v_lshrrev_b32_e32 v6, 16, v1			; SI-NEXT: v_lshrrev_b32_e32 v6, 16, v1
	; SI-NEXT: s_waitcnt vmcnt(1)			; SI-NEXT: s_waitcnt vmcnt(1)
	; SI-NEXT: v_cvt_f32_f16_e32 v4, v2			; SI-NEXT: v_cvt_f32_f16_e32 v4, v2
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_lshrrev_b32_e32 v7, 16, v3			; SI-NEXT: v_lshrrev_b32_e32 v7, 16, v3
	; SI-NEXT: v_lshrrev_b32_e32 v2, 16, v2			; SI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
				; SI-NEXT: v_cvt_f32_f16_e32 v5, v5
	; SI-NEXT: v_cvt_f32_f16_e32 v6, v6			; SI-NEXT: v_cvt_f32_f16_e32 v6, v6
				; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: v_cvt_f32_f16_e32 v2, v2			; SI-NEXT: v_cvt_f32_f16_e32 v2, v2
	; SI-NEXT: v_cvt_f32_f16_e32 v7, v7			; SI-NEXT: v_cvt_f32_f16_e32 v7, v7
	; SI-NEXT: v_cvt_f32_f16_e32 v1, v1			; SI-NEXT: v_cvt_f32_f16_e32 v1, v1
	; SI-NEXT: v_cvt_f32_f16_e32 v3, v3			; SI-NEXT: v_cvt_f32_f16_e32 v3, v3
	; SI-NEXT: v_cmp_lt_f32_e32 vcc, v5, v6			; SI-NEXT: v_cmp_lt_f32_e32 vcc, v5, v6
	; SI-NEXT: v_cndmask_b32_e32 v2, v2, v7, vcc			; SI-NEXT: v_cndmask_b32_e32 v2, v2, v7, vcc
	; SI-NEXT: v_cmp_lt_f32_e32 vcc, v0, v1			; SI-NEXT: v_cmp_lt_f32_e32 vcc, v0, v1
	; SI-NEXT: v_cndmask_b32_e32 v0, v4, v3, vcc			; SI-NEXT: v_cndmask_b32_e32 v0, v4, v3, vcc
	Show All 9 Lines
	; VI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x44			; VI-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x44
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_mov_b32 s14, s2			; VI-NEXT: s_mov_b32 s14, s2
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s0, s4			; VI-NEXT: s_mov_b32 s0, s4
	; VI-NEXT: s_mov_b32 s1, s5			; VI-NEXT: s_mov_b32 s1, s5
	; VI-NEXT: s_mov_b32 s16, s10
	; VI-NEXT: s_mov_b32 s17, s11
	; VI-NEXT: s_mov_b32 s4, s6			; VI-NEXT: s_mov_b32 s4, s6
	; VI-NEXT: s_mov_b32 s5, s7			; VI-NEXT: s_mov_b32 s5, s7
	; VI-NEXT: s_mov_b32 s10, s2
	; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
				; VI-NEXT: s_mov_b32 s16, s10
				; VI-NEXT: s_mov_b32 s17, s11
	; VI-NEXT: s_mov_b32 s15, s3			; VI-NEXT: s_mov_b32 s15, s3
	; VI-NEXT: buffer_load_dword v0, off, s[4:7], 0
	; VI-NEXT: s_mov_b32 s18, s2			; VI-NEXT: s_mov_b32 s18, s2
	; VI-NEXT: s_mov_b32 s19, s3			; VI-NEXT: s_mov_b32 s19, s3
				; VI-NEXT: s_mov_b32 s10, s2
				; VI-NEXT: s_mov_b32 s11, s3
				; VI-NEXT: buffer_load_dword v0, off, s[4:7], 0
	; VI-NEXT: buffer_load_dword v1, off, s[8:11], 0			; VI-NEXT: buffer_load_dword v1, off, s[8:11], 0
	; VI-NEXT: buffer_load_dword v2, off, s[12:15], 0			; VI-NEXT: buffer_load_dword v2, off, s[12:15], 0
	; VI-NEXT: buffer_load_dword v3, off, s[16:19], 0			; VI-NEXT: buffer_load_dword v3, off, s[16:19], 0
	; VI-NEXT: s_waitcnt vmcnt(3)			; VI-NEXT: s_waitcnt vmcnt(3)
	; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v0			; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v0
	; VI-NEXT: s_waitcnt vmcnt(2)			; VI-NEXT: s_waitcnt vmcnt(2)
	; VI-NEXT: v_cmp_lt_f16_e32 vcc, v0, v1			; VI-NEXT: v_cmp_lt_f16_e32 vcc, v0, v1
	; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v1			; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v1
	Show All 24 Lines
	}			}

	define amdgpu_kernel void @select_v2f16_imm_a(			define amdgpu_kernel void @select_v2f16_imm_a(
	; SI-LABEL: select_v2f16_imm_a:			; SI-LABEL: select_v2f16_imm_a:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s11, 0xf000			; SI-NEXT: s_mov_b32 s11, 0xf000
	; SI-NEXT: s_mov_b32 s10, -1			; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_mov_b32 s18, s10			; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s19, s11			; SI-NEXT: s_mov_b32 s15, s11
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s17, s3
	; SI-NEXT: s_mov_b32 s12, s6			; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7			; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s14, s10			; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s15, s11			; SI-NEXT: s_mov_b32 s17, s3
				; SI-NEXT: s_mov_b32 s18, s10
				; SI-NEXT: s_mov_b32 s19, s11
	; SI-NEXT: s_mov_b32 s6, s10			; SI-NEXT: s_mov_b32 s6, s10
	; SI-NEXT: s_mov_b32 s7, s11			; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: buffer_load_dword v0, off, s[16:19], 0			; SI-NEXT: buffer_load_dword v0, off, s[16:19], 0
	; SI-NEXT: buffer_load_dword v1, off, s[4:7], 0			; SI-NEXT: buffer_load_dword v1, off, s[4:7], 0
	; SI-NEXT: buffer_load_dword v2, off, s[12:15], 0			; SI-NEXT: buffer_load_dword v2, off, s[12:15], 0
	; SI-NEXT: s_mov_b32 s2, 0x3f200000			; SI-NEXT: s_mov_b32 s2, 0x3f200000
	; SI-NEXT: s_mov_b32 s8, s0			; SI-NEXT: s_mov_b32 s8, s0
	; SI-NEXT: s_mov_b32 s9, s1			; SI-NEXT: s_mov_b32 s9, s1
	Show All 25 Lines
	; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; VI-NEXT: s_mov_b32 s11, 0xf000			; VI-NEXT: s_mov_b32 s11, 0xf000
	; VI-NEXT: s_mov_b32 s10, -1			; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_mov_b32 s14, s10			; VI-NEXT: s_mov_b32 s14, s10
	; VI-NEXT: s_mov_b32 s15, s11			; VI-NEXT: s_mov_b32 s15, s11
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s8, s0			; VI-NEXT: s_mov_b32 s8, s0
	; VI-NEXT: s_mov_b32 s9, s1			; VI-NEXT: s_mov_b32 s9, s1
				; VI-NEXT: s_mov_b32 s12, s6
				; VI-NEXT: s_mov_b32 s13, s7
	; VI-NEXT: s_mov_b32 s0, s2			; VI-NEXT: s_mov_b32 s0, s2
	; VI-NEXT: s_mov_b32 s1, s3			; VI-NEXT: s_mov_b32 s1, s3
	; VI-NEXT: s_mov_b32 s2, s10			; VI-NEXT: s_mov_b32 s2, s10
	; VI-NEXT: s_mov_b32 s3, s11			; VI-NEXT: s_mov_b32 s3, s11
	; VI-NEXT: s_mov_b32 s12, s6
	; VI-NEXT: s_mov_b32 s13, s7
	; VI-NEXT: s_mov_b32 s6, s10			; VI-NEXT: s_mov_b32 s6, s10
	; VI-NEXT: s_mov_b32 s7, s11			; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: buffer_load_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_load_dword v0, off, s[0:3], 0
	; VI-NEXT: buffer_load_dword v1, off, s[4:7], 0			; VI-NEXT: buffer_load_dword v1, off, s[4:7], 0
	; VI-NEXT: buffer_load_dword v2, off, s[12:15], 0			; VI-NEXT: buffer_load_dword v2, off, s[12:15], 0
	; VI-NEXT: s_movk_i32 s0, 0x3900			; VI-NEXT: s_movk_i32 s0, 0x3900
	; VI-NEXT: s_waitcnt vmcnt(2)			; VI-NEXT: s_waitcnt vmcnt(2)
	; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v0			; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v0
	Show All 23 Lines
	}			}

	define amdgpu_kernel void @select_v2f16_imm_b(			define amdgpu_kernel void @select_v2f16_imm_b(
	; SI-LABEL: select_v2f16_imm_b:			; SI-LABEL: select_v2f16_imm_b:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9			; SI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s11, 0xf000			; SI-NEXT: s_mov_b32 s11, 0xf000
	; SI-NEXT: s_mov_b32 s10, -1			; SI-NEXT: s_mov_b32 s10, -1
	; SI-NEXT: s_mov_b32 s18, s10			; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s19, s11			; SI-NEXT: s_mov_b32 s15, s11
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s17, s3
	; SI-NEXT: s_mov_b32 s12, s6			; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7			; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s14, s10			; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s15, s11			; SI-NEXT: s_mov_b32 s17, s3
				; SI-NEXT: s_mov_b32 s18, s10
				; SI-NEXT: s_mov_b32 s19, s11
	; SI-NEXT: s_mov_b32 s6, s10			; SI-NEXT: s_mov_b32 s6, s10
	; SI-NEXT: s_mov_b32 s7, s11			; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: buffer_load_dword v0, off, s[16:19], 0			; SI-NEXT: buffer_load_dword v0, off, s[16:19], 0
	; SI-NEXT: buffer_load_dword v1, off, s[4:7], 0			; SI-NEXT: buffer_load_dword v1, off, s[4:7], 0
	; SI-NEXT: buffer_load_dword v2, off, s[12:15], 0			; SI-NEXT: buffer_load_dword v2, off, s[12:15], 0
	; SI-NEXT: s_mov_b32 s2, 0x3f200000			; SI-NEXT: s_mov_b32 s2, 0x3f200000
	; SI-NEXT: s_mov_b32 s8, s0			; SI-NEXT: s_mov_b32 s8, s0
	; SI-NEXT: s_mov_b32 s9, s1			; SI-NEXT: s_mov_b32 s9, s1
	Show All 25 Lines
	; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[0:7], s[0:1], 0x24
	; VI-NEXT: s_mov_b32 s11, 0xf000			; VI-NEXT: s_mov_b32 s11, 0xf000
	; VI-NEXT: s_mov_b32 s10, -1			; VI-NEXT: s_mov_b32 s10, -1
	; VI-NEXT: s_mov_b32 s14, s10			; VI-NEXT: s_mov_b32 s14, s10
	; VI-NEXT: s_mov_b32 s15, s11			; VI-NEXT: s_mov_b32 s15, s11
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s8, s0			; VI-NEXT: s_mov_b32 s8, s0
	; VI-NEXT: s_mov_b32 s9, s1			; VI-NEXT: s_mov_b32 s9, s1
				; VI-NEXT: s_mov_b32 s12, s6
				; VI-NEXT: s_mov_b32 s13, s7
	; VI-NEXT: s_mov_b32 s0, s2			; VI-NEXT: s_mov_b32 s0, s2
	; VI-NEXT: s_mov_b32 s1, s3			; VI-NEXT: s_mov_b32 s1, s3
	; VI-NEXT: s_mov_b32 s2, s10			; VI-NEXT: s_mov_b32 s2, s10
	; VI-NEXT: s_mov_b32 s3, s11			; VI-NEXT: s_mov_b32 s3, s11
	; VI-NEXT: s_mov_b32 s12, s6
	; VI-NEXT: s_mov_b32 s13, s7
	; VI-NEXT: s_mov_b32 s6, s10			; VI-NEXT: s_mov_b32 s6, s10
	; VI-NEXT: s_mov_b32 s7, s11			; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: buffer_load_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_load_dword v0, off, s[0:3], 0
	; VI-NEXT: buffer_load_dword v1, off, s[4:7], 0			; VI-NEXT: buffer_load_dword v1, off, s[4:7], 0
	; VI-NEXT: buffer_load_dword v2, off, s[12:15], 0			; VI-NEXT: buffer_load_dword v2, off, s[12:15], 0
	; VI-NEXT: s_movk_i32 s0, 0x3900			; VI-NEXT: s_movk_i32 s0, 0x3900
	; VI-NEXT: s_waitcnt vmcnt(2)			; VI-NEXT: s_waitcnt vmcnt(2)
	; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v0			; VI-NEXT: v_lshrrev_b32_e32 v3, 16, v0
	Show All 32 Lines
	; SI-NEXT: s_mov_b32 s19, s11			; SI-NEXT: s_mov_b32 s19, s11
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s12, s6			; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7			; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s6, s10			; SI-NEXT: s_mov_b32 s6, s10
	; SI-NEXT: s_mov_b32 s7, s11			; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: s_mov_b32 s16, s2			; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s17, s3			; SI-NEXT: s_mov_b32 s17, s3
	; SI-NEXT: buffer_load_dword v3, off, s[4:7], 0
	; SI-NEXT: s_mov_b32 s14, s10			; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s15, s11			; SI-NEXT: s_mov_b32 s15, s11
	; SI-NEXT: buffer_load_dword v0, off, s[16:19], 0			; SI-NEXT: buffer_load_dword v0, off, s[16:19], 0
	; SI-NEXT: buffer_load_dword v1, off, s[12:15], 0			; SI-NEXT: buffer_load_dword v1, off, s[12:15], 0
				; SI-NEXT: buffer_load_dword v3, off, s[4:7], 0
	; SI-NEXT: v_mov_b32_e32 v2, 0x3f200000			; SI-NEXT: v_mov_b32_e32 v2, 0x3f200000
	; SI-NEXT: s_mov_b32 s8, s0			; SI-NEXT: s_mov_b32 s8, s0
	; SI-NEXT: s_mov_b32 s9, s1			; SI-NEXT: s_mov_b32 s9, s1
	; SI-NEXT: s_waitcnt vmcnt(2)			; SI-NEXT: s_waitcnt vmcnt(2)
	; SI-NEXT: v_lshrrev_b32_e32 v5, 16, v3
	; SI-NEXT: v_cvt_f32_f16_e32 v5, v5
	; SI-NEXT: v_cvt_f32_f16_e32 v3, v3
	; SI-NEXT: s_waitcnt vmcnt(1)
	; SI-NEXT: v_cvt_f32_f16_e32 v4, v0			; SI-NEXT: v_cvt_f32_f16_e32 v4, v0
	; SI-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; SI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: v_lshrrev_b32_e32 v5, 16, v3
	; SI-NEXT: v_lshrrev_b32_e32 v6, 16, v1			; SI-NEXT: v_lshrrev_b32_e32 v6, 16, v1
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
				; SI-NEXT: v_cvt_f32_f16_e32 v5, v5
	; SI-NEXT: v_cvt_f32_f16_e32 v6, v6			; SI-NEXT: v_cvt_f32_f16_e32 v6, v6
				; SI-NEXT: v_cvt_f32_f16_e32 v3, v3
	; SI-NEXT: v_cvt_f32_f16_e32 v1, v1			; SI-NEXT: v_cvt_f32_f16_e32 v1, v1
	; SI-NEXT: v_cmp_nlt_f32_e32 vcc, v0, v5			; SI-NEXT: v_cmp_nlt_f32_e32 vcc, v0, v5
	; SI-NEXT: v_cndmask_b32_e32 v0, v2, v6, vcc			; SI-NEXT: v_cndmask_b32_e32 v0, v2, v6, vcc
	; SI-NEXT: v_cmp_nlt_f32_e32 vcc, v4, v3			; SI-NEXT: v_cmp_nlt_f32_e32 vcc, v4, v3
	; SI-NEXT: v_cvt_f16_f32_e32 v0, v0			; SI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; SI-NEXT: v_cndmask_b32_e32 v1, 0.5, v1, vcc			; SI-NEXT: v_cndmask_b32_e32 v1, 0.5, v1, vcc
	; SI-NEXT: v_cvt_f16_f32_e32 v1, v1			; SI-NEXT: v_cvt_f16_f32_e32 v1, v1
	; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v0			; SI-NEXT: v_lshlrev_b32_e32 v0, 16, v0
	Show All 15 Lines
	; VI-NEXT: s_mov_b32 s13, s7			; VI-NEXT: s_mov_b32 s13, s7
	; VI-NEXT: s_mov_b32 s0, s2			; VI-NEXT: s_mov_b32 s0, s2
	; VI-NEXT: s_mov_b32 s1, s3			; VI-NEXT: s_mov_b32 s1, s3
	; VI-NEXT: s_mov_b32 s6, s10			; VI-NEXT: s_mov_b32 s6, s10
	; VI-NEXT: s_mov_b32 s7, s11			; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: s_mov_b32 s2, s10			; VI-NEXT: s_mov_b32 s2, s10
	; VI-NEXT: s_mov_b32 s3, s11			; VI-NEXT: s_mov_b32 s3, s11
	; VI-NEXT: buffer_load_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_load_dword v0, off, s[0:3], 0
	; VI-NEXT: buffer_load_dword v4, off, s[4:7], 0
	; VI-NEXT: buffer_load_dword v1, off, s[12:15], 0			; VI-NEXT: buffer_load_dword v1, off, s[12:15], 0
				; VI-NEXT: buffer_load_dword v4, off, s[4:7], 0
	; VI-NEXT: v_mov_b32_e32 v2, 0x3800			; VI-NEXT: v_mov_b32_e32 v2, 0x3800
	; VI-NEXT: v_mov_b32_e32 v3, 0x3900			; VI-NEXT: v_mov_b32_e32 v3, 0x3900
	; VI-NEXT: s_waitcnt vmcnt(2)			; VI-NEXT: s_waitcnt vmcnt(2)
	; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v0			; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v0
	; VI-NEXT: s_waitcnt vmcnt(1)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_cmp_nlt_f16_e32 vcc, v0, v4			; VI-NEXT: v_cmp_nlt_f16_e32 vcc, v0, v4
	; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v4			; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v4
	; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_cndmask_b32_e32 v0, v2, v1, vcc			; VI-NEXT: v_cndmask_b32_e32 v0, v2, v1, vcc
	; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; VI-NEXT: v_cmp_nlt_f16_e32 vcc, v6, v5			; VI-NEXT: v_cmp_nlt_f16_e32 vcc, v6, v5
	; VI-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc			; VI-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc
	; VI-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; VI-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; VI-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; VI-NEXT: buffer_store_dword v0, off, s[8:11], 0			; VI-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	Show All 21 Lines
	; SI-NEXT: s_mov_b32 s19, s11			; SI-NEXT: s_mov_b32 s19, s11
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s12, s6			; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7			; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s6, s10			; SI-NEXT: s_mov_b32 s6, s10
	; SI-NEXT: s_mov_b32 s7, s11			; SI-NEXT: s_mov_b32 s7, s11
	; SI-NEXT: s_mov_b32 s16, s2			; SI-NEXT: s_mov_b32 s16, s2
	; SI-NEXT: s_mov_b32 s17, s3			; SI-NEXT: s_mov_b32 s17, s3
	; SI-NEXT: buffer_load_dword v3, off, s[4:7], 0
	; SI-NEXT: s_mov_b32 s14, s10			; SI-NEXT: s_mov_b32 s14, s10
	; SI-NEXT: s_mov_b32 s15, s11			; SI-NEXT: s_mov_b32 s15, s11
	; SI-NEXT: buffer_load_dword v0, off, s[16:19], 0			; SI-NEXT: buffer_load_dword v0, off, s[16:19], 0
	; SI-NEXT: buffer_load_dword v1, off, s[12:15], 0			; SI-NEXT: buffer_load_dword v1, off, s[12:15], 0
				; SI-NEXT: buffer_load_dword v3, off, s[4:7], 0
	; SI-NEXT: v_mov_b32_e32 v2, 0x3f200000			; SI-NEXT: v_mov_b32_e32 v2, 0x3f200000
	; SI-NEXT: s_mov_b32 s8, s0			; SI-NEXT: s_mov_b32 s8, s0
	; SI-NEXT: s_mov_b32 s9, s1			; SI-NEXT: s_mov_b32 s9, s1
	; SI-NEXT: s_waitcnt vmcnt(2)			; SI-NEXT: s_waitcnt vmcnt(2)
	; SI-NEXT: v_lshrrev_b32_e32 v5, 16, v3
	; SI-NEXT: v_cvt_f32_f16_e32 v5, v5
	; SI-NEXT: v_cvt_f32_f16_e32 v3, v3
	; SI-NEXT: s_waitcnt vmcnt(1)
	; SI-NEXT: v_lshrrev_b32_e32 v4, 16, v0			; SI-NEXT: v_lshrrev_b32_e32 v4, 16, v0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(1)
	; SI-NEXT: v_lshrrev_b32_e32 v6, 16, v1			; SI-NEXT: v_lshrrev_b32_e32 v6, 16, v1
				; SI-NEXT: s_waitcnt vmcnt(0)
				; SI-NEXT: v_lshrrev_b32_e32 v5, 16, v3
	; SI-NEXT: v_cvt_f32_f16_e32 v4, v4			; SI-NEXT: v_cvt_f32_f16_e32 v4, v4
				; SI-NEXT: v_cvt_f32_f16_e32 v5, v5
	; SI-NEXT: v_cvt_f32_f16_e32 v0, v0			; SI-NEXT: v_cvt_f32_f16_e32 v0, v0
	; SI-NEXT: v_cvt_f32_f16_e32 v6, v6			; SI-NEXT: v_cvt_f32_f16_e32 v6, v6
				; SI-NEXT: v_cvt_f32_f16_e32 v3, v3
	; SI-NEXT: v_cvt_f32_f16_e32 v1, v1			; SI-NEXT: v_cvt_f32_f16_e32 v1, v1
	; SI-NEXT: v_cmp_lt_f32_e32 vcc, v4, v5			; SI-NEXT: v_cmp_lt_f32_e32 vcc, v4, v5
	; SI-NEXT: v_cndmask_b32_e32 v2, v2, v6, vcc			; SI-NEXT: v_cndmask_b32_e32 v2, v2, v6, vcc
	; SI-NEXT: v_cmp_lt_f32_e32 vcc, v0, v3			; SI-NEXT: v_cmp_lt_f32_e32 vcc, v0, v3
	; SI-NEXT: v_cndmask_b32_e32 v0, 0.5, v1, vcc			; SI-NEXT: v_cndmask_b32_e32 v0, 0.5, v1, vcc
	; SI-NEXT: v_cvt_f16_f32_e32 v2, v2			; SI-NEXT: v_cvt_f16_f32_e32 v2, v2
	; SI-NEXT: v_cvt_f16_f32_e32 v0, v0			; SI-NEXT: v_cvt_f16_f32_e32 v0, v0
	; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v2			; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v2
	Show All 15 Lines
	; VI-NEXT: s_mov_b32 s13, s7			; VI-NEXT: s_mov_b32 s13, s7
	; VI-NEXT: s_mov_b32 s0, s2			; VI-NEXT: s_mov_b32 s0, s2
	; VI-NEXT: s_mov_b32 s1, s3			; VI-NEXT: s_mov_b32 s1, s3
	; VI-NEXT: s_mov_b32 s6, s10			; VI-NEXT: s_mov_b32 s6, s10
	; VI-NEXT: s_mov_b32 s7, s11			; VI-NEXT: s_mov_b32 s7, s11
	; VI-NEXT: s_mov_b32 s2, s10			; VI-NEXT: s_mov_b32 s2, s10
	; VI-NEXT: s_mov_b32 s3, s11			; VI-NEXT: s_mov_b32 s3, s11
	; VI-NEXT: buffer_load_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_load_dword v0, off, s[0:3], 0
	; VI-NEXT: buffer_load_dword v4, off, s[4:7], 0
	; VI-NEXT: buffer_load_dword v1, off, s[12:15], 0			; VI-NEXT: buffer_load_dword v1, off, s[12:15], 0
				; VI-NEXT: buffer_load_dword v4, off, s[4:7], 0
	; VI-NEXT: v_mov_b32_e32 v2, 0x3800			; VI-NEXT: v_mov_b32_e32 v2, 0x3800
	; VI-NEXT: v_mov_b32_e32 v3, 0x3900			; VI-NEXT: v_mov_b32_e32 v3, 0x3900
	; VI-NEXT: s_waitcnt vmcnt(2)			; VI-NEXT: s_waitcnt vmcnt(2)
	; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v0			; VI-NEXT: v_lshrrev_b32_e32 v6, 16, v0
	; VI-NEXT: s_waitcnt vmcnt(1)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_cmp_lt_f16_e32 vcc, v0, v4			; VI-NEXT: v_cmp_lt_f16_e32 vcc, v0, v4
	; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v4			; VI-NEXT: v_lshrrev_b32_e32 v5, 16, v4
	; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_cndmask_b32_e32 v0, v2, v1, vcc			; VI-NEXT: v_cndmask_b32_e32 v0, v2, v1, vcc
	; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1			; VI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
	; VI-NEXT: v_cmp_lt_f16_e32 vcc, v6, v5			; VI-NEXT: v_cmp_lt_f16_e32 vcc, v6, v5
	; VI-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc			; VI-NEXT: v_cndmask_b32_e32 v1, v3, v1, vcc
	; VI-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; VI-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; VI-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
	; VI-NEXT: buffer_store_dword v0, off, s[8:11], 0			; VI-NEXT: buffer_store_dword v0, off, s[8:11], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	Show All 13 Lines

llvm/test/CodeGen/AMDGPU/shl.ll

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	; EG-NEXT: 2(2.802597e-45), 0(0.000000e+00)
store i16 %result, i16 addrspace(1)* %out		store i16 %result, i16 addrspace(1)* %out
ret void		ret void
}		}

define amdgpu_kernel void @shl_i16_v_s(i16 addrspace(1)* %out, i16 addrspace(1)* %in, i16 %b) {		define amdgpu_kernel void @shl_i16_v_s(i16 addrspace(1)* %out, i16 addrspace(1)* %in, i16 %b) {
; GCN-LABEL: shl_i16_v_s:		; GCN-LABEL: shl_i16_v_s:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9		; GCN-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
		; GCN-NEXT: s_load_dword s8, s[0:1], 0xd
		foadUnsubmitted Not Done Reply Inline Actions Nice. foad: Nice.
; GCN-NEXT: s_mov_b32 s3, 0xf000		; GCN-NEXT: s_mov_b32 s3, 0xf000
; GCN-NEXT: s_mov_b32 s2, -1		; GCN-NEXT: s_mov_b32 s2, -1
; GCN-NEXT: s_load_dword s8, s[0:1], 0xd
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s0, s4		; GCN-NEXT: s_mov_b32 s0, s4
; GCN-NEXT: s_mov_b32 s1, s5		; GCN-NEXT: s_mov_b32 s1, s5
; GCN-NEXT: s_mov_b32 s4, s6		; GCN-NEXT: s_mov_b32 s4, s6
; GCN-NEXT: s_mov_b32 s5, s7		; GCN-NEXT: s_mov_b32 s5, s7
; GCN-NEXT: s_mov_b32 s6, s2		; GCN-NEXT: s_mov_b32 s6, s2
; GCN-NEXT: s_mov_b32 s7, s3		; GCN-NEXT: s_mov_b32 s7, s3
; GCN-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GCN-NEXT: buffer_load_ushort v0, off, s[4:7], 0
Show All 36 Lines	; EG-NEXT: 2(2.802597e-45), 0(0.000000e+00)
store i16 %result, i16 addrspace(1)* %out		store i16 %result, i16 addrspace(1)* %out
ret void		ret void
}		}

define amdgpu_kernel void @shl_i16_v_compute_s(i16 addrspace(1)* %out, i16 addrspace(1)* %in, i16 %b) {		define amdgpu_kernel void @shl_i16_v_compute_s(i16 addrspace(1)* %out, i16 addrspace(1)* %in, i16 %b) {
; GCN-LABEL: shl_i16_v_compute_s:		; GCN-LABEL: shl_i16_v_compute_s:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9		; GCN-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x9
		; GCN-NEXT: s_load_dword s8, s[0:1], 0xd
; GCN-NEXT: s_mov_b32 s3, 0xf000		; GCN-NEXT: s_mov_b32 s3, 0xf000
; GCN-NEXT: s_mov_b32 s2, -1		; GCN-NEXT: s_mov_b32 s2, -1
; GCN-NEXT: s_load_dword s8, s[0:1], 0xd
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s0, s4		; GCN-NEXT: s_mov_b32 s0, s4
; GCN-NEXT: s_mov_b32 s1, s5		; GCN-NEXT: s_mov_b32 s1, s5
; GCN-NEXT: s_mov_b32 s4, s6		; GCN-NEXT: s_mov_b32 s4, s6
; GCN-NEXT: s_mov_b32 s5, s7		; GCN-NEXT: s_mov_b32 s5, s7
; GCN-NEXT: s_mov_b32 s6, s2		; GCN-NEXT: s_mov_b32 s6, s2
; GCN-NEXT: s_mov_b32 s7, s3		; GCN-NEXT: s_mov_b32 s7, s3
; GCN-NEXT: buffer_load_ushort v0, off, s[4:7], 0		; GCN-NEXT: buffer_load_ushort v0, off, s[4:7], 0
▲ Show 20 Lines • Show All 1,496 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/shl.v2i16.ll

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	; CI-NEXT: s_endpgm
store <2 x i16> %result, <2 x i16> addrspace(1)* %out.gep		store <2 x i16> %result, <2 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @shl_v_s_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, <2 x i16> %sgpr) #0 {		define amdgpu_kernel void @shl_v_s_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, <2 x i16> %sgpr) #0 {
; GFX9-LABEL: shl_v_s_v2i16:		; GFX9-LABEL: shl_v_s_v2i16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: s_load_dword s0, s[0:1], 0x34		; GFX9-NEXT: s_load_dword s0, s[0:1], 0x34
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
		foadUnsubmitted Not Done Reply Inline Actions Nice. foad: Nice.
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s7		; GFX9-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s6, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s6, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dword v3, v[0:1], off		; GFX9-NEXT: global_load_dword v3, v[0:1], off
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s4, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s4, v2
; GFX9-NEXT: v_mov_b32_e32 v1, s5		; GFX9-NEXT: v_mov_b32_e32 v1, s5
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_pk_lshlrev_b16 v2, s0, v3		; GFX9-NEXT: v_pk_lshlrev_b16 v2, s0, v3
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: shl_v_s_v2i16:		; VI-LABEL: shl_v_s_v2i16:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_load_dword s0, s[0:1], 0x34		; VI-NEXT: s_load_dword s0, s[0:1], 0x34
		; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s7		; VI-NEXT: v_mov_b32_e32 v1, s7
; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dword v3, v[0:1]		; VI-NEXT: flat_load_dword v3, v[0:1]
; VI-NEXT: s_lshr_b32 s1, s0, 16		; VI-NEXT: s_lshr_b32 s1, s0, 16
; VI-NEXT: v_mov_b32_e32 v4, s1		; VI-NEXT: v_mov_b32_e32 v4, s1
; VI-NEXT: v_add_u32_e32 v0, vcc, s4, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s4, v2
Show All 39 Lines	; CI-NEXT: s_endpgm
store <2 x i16> %result, <2 x i16> addrspace(1)* %out.gep		store <2 x i16> %result, <2 x i16> addrspace(1)* %out.gep
ret void		ret void
}		}

define amdgpu_kernel void @shl_s_v_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, <2 x i16> %sgpr) #0 {		define amdgpu_kernel void @shl_s_v_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> addrspace(1)* %in, <2 x i16> %sgpr) #0 {
; GFX9-LABEL: shl_s_v_v2i16:		; GFX9-LABEL: shl_s_v_v2i16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; GFX9-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: s_load_dword s0, s[0:1], 0x34		; GFX9-NEXT: s_load_dword s0, s[0:1], 0x34
		; GFX9-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: v_mov_b32_e32 v1, s7		; GFX9-NEXT: v_mov_b32_e32 v1, s7
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s6, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s6, v2
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: global_load_dword v3, v[0:1], off		; GFX9-NEXT: global_load_dword v3, v[0:1], off
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s4, v2		; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s4, v2
; GFX9-NEXT: v_mov_b32_e32 v1, s5		; GFX9-NEXT: v_mov_b32_e32 v1, s5
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc		; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: v_pk_lshlrev_b16 v2, v3, s0		; GFX9-NEXT: v_pk_lshlrev_b16 v2, v3, s0
; GFX9-NEXT: global_store_dword v[0:1], v2, off		; GFX9-NEXT: global_store_dword v[0:1], v2, off
; GFX9-NEXT: s_endpgm		; GFX9-NEXT: s_endpgm
;		;
; VI-LABEL: shl_s_v_v2i16:		; VI-LABEL: shl_s_v_v2i16:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24		; VI-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_load_dword s0, s[0:1], 0x34		; VI-NEXT: s_load_dword s0, s[0:1], 0x34
		; VI-NEXT: v_lshlrev_b32_e32 v2, 2, v0
; VI-NEXT: s_waitcnt lgkmcnt(0)		; VI-NEXT: s_waitcnt lgkmcnt(0)
; VI-NEXT: v_mov_b32_e32 v1, s7		; VI-NEXT: v_mov_b32_e32 v1, s7
; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s6, v2
; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc		; VI-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; VI-NEXT: flat_load_dword v3, v[0:1]		; VI-NEXT: flat_load_dword v3, v[0:1]
; VI-NEXT: s_lshr_b32 s1, s0, 16		; VI-NEXT: s_lshr_b32 s1, s0, 16
; VI-NEXT: v_mov_b32_e32 v4, s1		; VI-NEXT: v_mov_b32_e32 v4, s1
; VI-NEXT: v_add_u32_e32 v0, vcc, s4, v2		; VI-NEXT: v_add_u32_e32 v0, vcc, s4, v2
▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll

	Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines
	; GCN: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}			; GCN: v_readfirstlane_b32 s[[PTR_HI:[0-9]+]], v{{[0-9]+}}

	; CI-DAG: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0x1			; CI-DAG: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0x1
	; CI-DAG: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0x3			; CI-DAG: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0x3

	; GFX9-DAG: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0x4			; GFX9-DAG: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0x4
	; GFX9-DAG: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0xc			; GFX9-DAG: s_load_dword s{{[0-9]+}}, s{{\[}}[[PTR_LO]]:[[PTR_HI]]{{\]}}, 0xc

	; GCN: ds_write_b32			; GCN-DAG: ds_write_b32
	; CI: buffer_store_dword			; CI: buffer_store_dword
	; GFX9: global_store_dword			; GFX9: global_store_dword
	define amdgpu_kernel void @reorder_constant_load_local_store_constant_load(i32 addrspace(1)* %out, i32 addrspace(3)* %lptr) #0 {			define amdgpu_kernel void @reorder_constant_load_local_store_constant_load(i32 addrspace(1)* %out, i32 addrspace(3)* %lptr) #0 {
	%ptr0 = load i32 addrspace(4), i32 addrspace(4) addrspace(3)* @stored_constant_ptr, align 8			%ptr0 = load i32 addrspace(4), i32 addrspace(4) addrspace(3)* @stored_constant_ptr, align 8

	%ptr1 = getelementptr inbounds i32, i32 addrspace(4)* %ptr0, i64 1			%ptr1 = getelementptr inbounds i32, i32 addrspace(4)* %ptr0, i64 1
	%ptr2 = getelementptr inbounds i32, i32 addrspace(4)* %ptr0, i64 3			%ptr2 = getelementptr inbounds i32, i32 addrspace(4)* %ptr0, i64 3

	▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	; GCN-LABEL: {{^}}reorder_global_offsets_addr64_soffset0:			; GCN-LABEL: {{^}}reorder_global_offsets_addr64_soffset0:
	; CI: buffer_load_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:12{{$}}			; CI: buffer_load_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:12{{$}}
	; CI-NEXT: buffer_load_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:28{{$}}			; CI-NEXT: buffer_load_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:28{{$}}
	; CI-NEXT: buffer_load_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:44{{$}}			; CI-NEXT: buffer_load_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:44{{$}}

	; CI: v_mov_b32			; CI: v_mov_b32
	; CI: v_mov_b32			; CI: v_mov_b32

	; CI: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}

	; CI: v_add_i32			; CI: v_add_i32
	; CI: v_add_i32			; CI: v_add_i32

				; CI: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64{{$}}
	; CI: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:20{{$}}			; CI: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:20{{$}}

	; CI: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:36{{$}}			; CI: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:36{{$}}
	; CI-NEXT: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:52{{$}}			; CI-NEXT: buffer_store_dword v{{[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0 addr64 offset:52{{$}}

				; GFX9: global_load_dword {{v[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:12
				; GFX9: global_load_dword {{v[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:44
	; GFX9: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}{{$}}			; GFX9: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}{{$}}
	; GFX9: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} offset:20			; GFX9: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} offset:20
	; GFX9: global_load_dword {{v[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:12
	; GFX9: global_load_dword {{v[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:28			; GFX9: global_load_dword {{v[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:28
	; GFX9: global_load_dword {{v[0-9]+}}, v{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}} offset:44

	; GFX9: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} offset:36			; GFX9: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} offset:36
	; GFX9: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} offset:52			; GFX9: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} offset:52

	define amdgpu_kernel void @reorder_global_offsets_addr64_soffset0(i32 addrspace(1)* noalias nocapture %ptr.base) #0 {			define amdgpu_kernel void @reorder_global_offsets_addr64_soffset0(i32 addrspace(1)* noalias nocapture %ptr.base) #0 {
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	%id.ext = sext i32 %id to i64			%id.ext = sext i32 %id to i64

	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sign_extend.ll

	Show First 20 Lines • Show All 550 Lines • ▼ Show 20 Lines
	; VI-NEXT: s_mov_b32 s0, s4			; VI-NEXT: s_mov_b32 s0, s4
	; VI-NEXT: s_mov_b32 s1, s5			; VI-NEXT: s_mov_b32 s1, s5
	; VI-NEXT: s_mov_b32 s4, s6			; VI-NEXT: s_mov_b32 s4, s6
	; VI-NEXT: s_mov_b32 s5, s7			; VI-NEXT: s_mov_b32 s5, s7
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
	; VI-NEXT: buffer_load_dwordx2 v[0:1], off, s[4:7], 0			; VI-NEXT: buffer_load_dwordx2 v[0:1], off, s[4:7], 0
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_ashrrev_i32_e32 v2, 16, v1
	; VI-NEXT: v_ashrrev_i32_e32 v3, 16, v0			; VI-NEXT: v_ashrrev_i32_e32 v3, 16, v0
	; VI-NEXT: v_bfe_i32 v0, v0, 0, 16			; VI-NEXT: v_bfe_i32 v0, v0, 0, 16
				; VI-NEXT: v_ashrrev_i32_e32 v2, 16, v1
	; VI-NEXT: v_bfe_i32 v1, v1, 0, 16			; VI-NEXT: v_bfe_i32 v1, v1, 0, 16
	; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; VI-NEXT: buffer_store_dword v3, off, s[0:3], 0			; VI-NEXT: buffer_store_dword v3, off, s[0:3], 0
	; VI-NEXT: buffer_store_dword v1, off, s[0:3], 0			; VI-NEXT: buffer_store_dword v1, off, s[0:3], 0
	; VI-NEXT: buffer_store_dword v2, off, s[0:3], 0			; VI-NEXT: buffer_store_dword v2, off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	%a = load i64, i64 addrspace(1)* %in			%a = load i64, i64 addrspace(1)* %in
	%cast = bitcast i64 %a to <4 x i16>			%cast = bitcast i64 %a to <4 x i16>
	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/sminmax.v2i16.ll

Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @v_min_max_v2i16(<2 x i16> addrspace(1)* %out0, <2 x i16> addrspace(1)* %out1, <2 x i16> addrspace(1)* %ptr0, <2 x i16> addrspace(1)* %ptr1) #0 {
%sel1 = select <2 x i1> %cond0, <2 x i16> %val1, <2 x i16> %val0		%sel1 = select <2 x i1> %cond0, <2 x i16> %val1, <2 x i16> %val0

store volatile <2 x i16> %sel0, <2 x i16> addrspace(1)* %out0, align 4		store volatile <2 x i16> %sel0, <2 x i16> addrspace(1)* %out0, align 4
store volatile <2 x i16> %sel1, <2 x i16> addrspace(1)* %out1, align 4		store volatile <2 x i16> %sel1, <2 x i16> addrspace(1)* %out1, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}s_min_max_v4i16:		; GCN-LABEL: {{^}}s_min_max_v4i16:
; GFX9: v_pk_max_i16		; GFX9-DAG: v_pk_max_i16
; GFX9: v_pk_min_i16		; GFX9-DAG: v_pk_min_i16
; GFX9: v_pk_max_i16		; GFX9-DAG: v_pk_max_i16
; GFX9: v_pk_min_i16		; GFX9-DAG: v_pk_min_i16
define amdgpu_kernel void @s_min_max_v4i16(<4 x i16> addrspace(1)* %out0, <4 x i16> addrspace(1)* %out1, <4 x i16> %val0, <4 x i16> %val1) #0 {		define amdgpu_kernel void @s_min_max_v4i16(<4 x i16> addrspace(1)* %out0, <4 x i16> addrspace(1)* %out1, <4 x i16> %val0, <4 x i16> %val1) #0 {
%cond0 = icmp sgt <4 x i16> %val0, %val1		%cond0 = icmp sgt <4 x i16> %val0, %val1
%sel0 = select <4 x i1> %cond0, <4 x i16> %val0, <4 x i16> %val1		%sel0 = select <4 x i1> %cond0, <4 x i16> %val0, <4 x i16> %val1
%sel1 = select <4 x i1> %cond0, <4 x i16> %val1, <4 x i16> %val0		%sel1 = select <4 x i1> %cond0, <4 x i16> %val1, <4 x i16> %val0

store volatile <4 x i16> %sel0, <4 x i16> addrspace(1)* %out0, align 4		store volatile <4 x i16> %sel0, <4 x i16> addrspace(1)* %out0, align 4
store volatile <4 x i16> %sel1, <4 x i16> addrspace(1)* %out1, align 4		store volatile <4 x i16> %sel1, <4 x i16> addrspace(1)* %out1, align 4
ret void		ret void
Show All 34 Lines

llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	; GFX9-NEXT: s_setpc_b64 s[30:31]
store i13 %arg, i13 addrspace(3)* %ptr, align 8		store i13 %arg, i13 addrspace(3)* %ptr, align 8
ret void		ret void
}		}

define void @local_store_i17(i17 addrspace(3)* %ptr, i17 %arg) #0 {		define void @local_store_i17(i17 addrspace(3)* %ptr, i17 %arg) #0 {
; CIVI-LABEL: local_store_i17:		; CIVI-LABEL: local_store_i17:
; CIVI: ; %bb.0:		; CIVI: ; %bb.0:
; CIVI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; CIVI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CIVI-NEXT: v_bfe_u32 v2, v1, 16, 1
; CIVI-NEXT: s_mov_b32 m0, -1		; CIVI-NEXT: s_mov_b32 m0, -1
		; CIVI-NEXT: v_bfe_u32 v2, v1, 16, 1
; CIVI-NEXT: ds_write_b16 v0, v1		; CIVI-NEXT: ds_write_b16 v0, v1
; CIVI-NEXT: ds_write_b8 v0, v2 offset:2		; CIVI-NEXT: ds_write_b8 v0, v2 offset:2
; CIVI-NEXT: s_waitcnt lgkmcnt(0)		; CIVI-NEXT: s_waitcnt lgkmcnt(0)
; CIVI-NEXT: s_setpc_b64 s[30:31]		; CIVI-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX9-LABEL: local_store_i17:		; GFX9-LABEL: local_store_i17:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
Show All 10 Lines

llvm/test/CodeGen/AMDGPU/v_mac_f16.ll

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	entry:
%t.val = fmul <2 x half> %a.val, %b.val		%t.val = fmul <2 x half> %a.val, %b.val
%r.val = fadd <2 x half> %t.val, %c.val		%r.val = fadd <2 x half> %t.val, %c.val

store <2 x half> %r.val, <2 x half> addrspace(1)* %r		store <2 x half> %r.val, <2 x half> addrspace(1)* %r
ret void		ret void
}		}

; GCN-LABEL: {{^}}mac_v2f16_same_add:		; GCN-LABEL: {{^}}mac_v2f16_same_add:
; SI: v_mad_f32 v{{[0-9]}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; SI-DAG: v_mad_f32 v{{[0-9]}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; SI: v_mac_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; SI-DAG: v_mac_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; SI: v_mad_f32 v{{[0-9]}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; SI-DAG: v_mad_f32 v{{[0-9]}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; SI: v_mac_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; SI-DAG: v_mac_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}

; VI-DAG: v_mac_f16_sdwa v{{[0-9]}}, v{{[0-9]+}}, v{{[0-9]+}} dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1		; VI-DAG: v_mac_f16_sdwa v{{[0-9]}}, v{{[0-9]+}}, v{{[0-9]+}} dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
; VI-DAG: v_mad_f16 v{{[0-9]}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; VI-DAG: v_mad_f16 v{{[0-9]}}, v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; VI-DAG: v_mac_f16_sdwa v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}} dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1		; VI-DAG: v_mac_f16_sdwa v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}} dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
; VI-DAG: v_mac_f16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; VI-DAG: v_mac_f16_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}

; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @mac_v2f16_same_add(		define amdgpu_kernel void @mac_v2f16_same_add(
▲ Show 20 Lines • Show All 316 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/v_madak_f16.ll

	Show All 35 Lines
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_mov_b32 s10, s2			; VI-NEXT: s_mov_b32 s10, s2
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s0, s4			; VI-NEXT: s_mov_b32 s0, s4
	; VI-NEXT: s_mov_b32 s1, s5			; VI-NEXT: s_mov_b32 s1, s5
	; VI-NEXT: s_mov_b32 s4, s6			; VI-NEXT: s_mov_b32 s4, s6
	; VI-NEXT: s_mov_b32 s5, s7			; VI-NEXT: s_mov_b32 s5, s7
	; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
				; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: buffer_load_ushort v0, off, s[4:7], 0			; VI-NEXT: buffer_load_ushort v0, off, s[4:7], 0
	; VI-NEXT: buffer_load_ushort v1, off, s[8:11], 0			; VI-NEXT: buffer_load_ushort v1, off, s[8:11], 0
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: v_madak_f16 v0, v0, v1, 0x4900			; VI-NEXT: v_madak_f16 v0, v0, v1, 0x4900
	; VI-NEXT: buffer_store_short v0, off, s[0:3], 0			; VI-NEXT: buffer_store_short v0, off, s[0:3], 0
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	half addrspace(1)* %r,			half addrspace(1)* %r,
	half addrspace(1)* %a,			half addrspace(1)* %a,
	Show All 15 Lines
	; SI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x9			; SI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x9
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x11			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x11
	; SI-NEXT: s_mov_b32 s15, 0xf000			; SI-NEXT: s_mov_b32 s15, 0xf000
	; SI-NEXT: s_mov_b32 s14, -1			; SI-NEXT: s_mov_b32 s14, -1
	; SI-NEXT: s_mov_b32 s2, s14			; SI-NEXT: s_mov_b32 s2, s14
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: s_mov_b32 s16, s10			; SI-NEXT: s_mov_b32 s16, s10
	; SI-NEXT: s_mov_b32 s17, s11			; SI-NEXT: s_mov_b32 s17, s11
				; SI-NEXT: s_mov_b32 s10, s14
				; SI-NEXT: s_mov_b32 s11, s15
	; SI-NEXT: s_mov_b32 s3, s15			; SI-NEXT: s_mov_b32 s3, s15
	; SI-NEXT: s_mov_b32 s18, s14			; SI-NEXT: s_mov_b32 s18, s14
	; SI-NEXT: s_mov_b32 s19, s15			; SI-NEXT: s_mov_b32 s19, s15
	; SI-NEXT: s_mov_b32 s10, s14
	; SI-NEXT: s_mov_b32 s11, s15
	; SI-NEXT: buffer_load_ushort v0, off, s[8:11], 0			; SI-NEXT: buffer_load_ushort v0, off, s[8:11], 0
	; SI-NEXT: buffer_load_ushort v1, off, s[16:19], 0			; SI-NEXT: buffer_load_ushort v1, off, s[16:19], 0
	; SI-NEXT: buffer_load_ushort v2, off, s[0:3], 0			; SI-NEXT: buffer_load_ushort v2, off, s[0:3], 0
	; SI-NEXT: v_mov_b32_e32 v3, 0x41200000			; SI-NEXT: v_mov_b32_e32 v3, 0x41200000
	; SI-NEXT: s_mov_b32 s12, s6			; SI-NEXT: s_mov_b32 s12, s6
	; SI-NEXT: s_mov_b32 s13, s7			; SI-NEXT: s_mov_b32 s13, s7
	; SI-NEXT: s_mov_b32 s6, s14			; SI-NEXT: s_mov_b32 s6, s14
	; SI-NEXT: s_mov_b32 s7, s15			; SI-NEXT: s_mov_b32 s7, s15
	Show All 16 Lines
	; VI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x24			; VI-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x24
	; VI-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x44			; VI-NEXT: s_load_dwordx2 s[12:13], s[0:1], 0x44
	; VI-NEXT: s_mov_b32 s3, 0xf000			; VI-NEXT: s_mov_b32 s3, 0xf000
	; VI-NEXT: s_mov_b32 s2, -1			; VI-NEXT: s_mov_b32 s2, -1
	; VI-NEXT: s_mov_b32 s14, s2			; VI-NEXT: s_mov_b32 s14, s2
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_mov_b32 s16, s10			; VI-NEXT: s_mov_b32 s16, s10
	; VI-NEXT: s_mov_b32 s17, s11			; VI-NEXT: s_mov_b32 s17, s11
				; VI-NEXT: s_mov_b32 s10, s2
				; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: s_mov_b32 s15, s3			; VI-NEXT: s_mov_b32 s15, s3
	; VI-NEXT: s_mov_b32 s18, s2			; VI-NEXT: s_mov_b32 s18, s2
	; VI-NEXT: s_mov_b32 s19, s3			; VI-NEXT: s_mov_b32 s19, s3
	; VI-NEXT: s_mov_b32 s10, s2
	; VI-NEXT: s_mov_b32 s11, s3
	; VI-NEXT: buffer_load_ushort v0, off, s[8:11], 0			; VI-NEXT: buffer_load_ushort v0, off, s[8:11], 0
	; VI-NEXT: buffer_load_ushort v1, off, s[16:19], 0			; VI-NEXT: buffer_load_ushort v1, off, s[16:19], 0
	; VI-NEXT: buffer_load_ushort v3, off, s[12:15], 0			; VI-NEXT: buffer_load_ushort v3, off, s[12:15], 0
	; VI-NEXT: v_mov_b32_e32 v2, 0x4900			; VI-NEXT: v_mov_b32_e32 v2, 0x4900
	; VI-NEXT: s_mov_b32 s0, s6			; VI-NEXT: s_mov_b32 s0, s6
	; VI-NEXT: s_mov_b32 s1, s7			; VI-NEXT: s_mov_b32 s1, s7
	; VI-NEXT: s_mov_b32 s6, s2			; VI-NEXT: s_mov_b32 s6, s2
	; VI-NEXT: s_mov_b32 s7, s3			; VI-NEXT: s_mov_b32 s7, s3
	Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Bundle loads before post-RA schedulerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 238308

llvm/lib/Target/AMDGPU/AMDGPU.h

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp

llvm/lib/Target/AMDGPU/SIPostRABundler.cpp

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

llvm/test/CodeGen/AMDGPU/copy-illegal-type.ll

llvm/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

llvm/test/CodeGen/AMDGPU/ds_write2st64.ll

llvm/test/CodeGen/AMDGPU/global-saddr.ll

llvm/test/CodeGen/AMDGPU/half.ll

llvm/test/CodeGen/AMDGPU/idot2.ll

llvm/test/CodeGen/AMDGPU/idot4s.ll

llvm/test/CodeGen/AMDGPU/idot4u.ll

llvm/test/CodeGen/AMDGPU/idot8s.ll

llvm/test/CodeGen/AMDGPU/idot8u.ll

llvm/test/CodeGen/AMDGPU/insert_vector_elt.ll

llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll

llvm/test/CodeGen/AMDGPU/llvm.maxnum.f16.ll

llvm/test/CodeGen/AMDGPU/llvm.minnum.f16.ll

llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll

llvm/test/CodeGen/AMDGPU/load-lo16.ll

llvm/test/CodeGen/AMDGPU/local-memory.amdgcn.ll

llvm/test/CodeGen/AMDGPU/lshr.v2i16.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-load.ll

llvm/test/CodeGen/AMDGPU/memory_clause.ll

llvm/test/CodeGen/AMDGPU/merge-store-crash.ll

llvm/test/CodeGen/AMDGPU/postra-bundle-memops.mir

llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll

llvm/test/CodeGen/AMDGPU/saddo.ll

llvm/test/CodeGen/AMDGPU/salu-to-valu.ll

llvm/test/CodeGen/AMDGPU/scratch-simple.ll

llvm/test/CodeGen/AMDGPU/select.f16.ll

llvm/test/CodeGen/AMDGPU/shl.ll

llvm/test/CodeGen/AMDGPU/shl.v2i16.ll

llvm/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll

llvm/test/CodeGen/AMDGPU/sign_extend.ll

llvm/test/CodeGen/AMDGPU/sminmax.v2i16.ll

llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll

llvm/test/CodeGen/AMDGPU/v_mac_f16.ll

llvm/test/CodeGen/AMDGPU/v_madak_f16.ll

[AMDGPU] Bundle loads before post-RA scheduler
ClosedPublic