This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] New AMDGPUInsertDelayAlu pass
ClosedPublic

Authored by foad on Jun 21 2022, 6:00 AM.

Download Raw Diff

Details

Reviewers

Joe_Nash
rampitec
piotr
jpages

Group Reviewers

Restricted Project

Commits

rGcfb7ffdec0eb: [AMDGPU] New AMDGPUInsertDelayAlu pass

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,100 ms	x64 debian > LLVM.CodeGen/NVPTX::wmma.py

Event Timeline

foad created this revision.Jun 21 2022, 6:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 6:00 AM

Herald added subscribers: kosarev, jsilvanus, wenlei and 12 others. · View Herald Transcript

foad requested review of this revision.Jun 21 2022, 6:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 6:00 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

foad added a reviewer: Restricted Project.Jun 21 2022, 6:01 AM

Harbormaster completed remote builds in B171070: Diff 438671.Jun 21 2022, 6:01 AM

For tests with generated checks: if the only difference was adding a bunch of s_delay_alu instructions then I regenerated the checks. If it was more disruptive (e.g. if it meant that shared prefixes could no longer be used) then I added -amdgpu-enable-delay-alu=0 to the GFX11 RUN lines instead.

arsenm added inline comments.Jun 22 2022, 5:54 PM

llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp
163	Should check LLVM_ENABLE_DUMP instead
218	Bad autos, I'm not even sure what the type is supposed to be here
302–303	Can merge these into one LLVM_DEBUG. Also should use printMBBReference
314–321	This is D128313 (except for SI_RETURN_TO_EPILOG which I'm not sure about since it's a bit weird)
324–329	Should extract this into a separate predicate function
331–333	Ditto
346–349	Why does the opcode need special casing here? Why not every tied operand?
386–387	One LLVM_DEBUG

Address review comments.

foad marked 7 inline comments as done.Jun 23 2022, 2:59 AM

foad added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp
302–303	Done, but the single LLVM_DEBUG looks uglier to me.
314–321	Can update this when that patch lands.
346–349	Because the hardware does not see a RAW dependency here: v_mov_b32 v0, 0 v_writelane_b32 v0, s0, 0 At the thread level there is no dependency. The MIR instruction uses a tied read-write operand for v0, which I suppose is a wave-level representation where the read represents the lanes that are not modified. Other instructions with tied operands (like V_MAC) represent a real read of a VGPR in all active lanes, so we do want to model a delay for them.

Harbormaster completed remote builds in B171536: Diff 439299.Jun 23 2022, 3:23 AM

Joe_Nash added a child revision: D128656: [AMDGPU] gfx11 Generate VOPD Instructions.Jun 27 2022, 8:38 AM

Joe_Nash mentioned this in D128656: [AMDGPU] gfx11 Generate VOPD Instructions.Jun 27 2022, 8:41 AM

Rebase on D128313.

Joe_Nash accepted this revision.Jun 29 2022, 10:24 AM

This revision is now accepted and ready to land.Jun 29 2022, 10:24 AM

jpages accepted this revision.Jun 29 2022, 10:43 AM

Harbormaster completed remote builds in B172806: Diff 441061.Jun 29 2022, 12:14 PM

This revision was landed with ongoing or failed builds.Jun 29 2022, 1:33 PM

Closed by commit rGcfb7ffdec0eb: [AMDGPU] New AMDGPUInsertDelayAlu pass (authored by foad). · Explain Why

This revision was automatically updated to reflect the committed changes.

foad added a commit: rGcfb7ffdec0eb: [AMDGPU] New AMDGPUInsertDelayAlu pass.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.h

3 lines

AMDGPUInsertDelayAlu.cpp

457 lines

AMDGPUTargetMachine.cpp

11 lines

CMakeLists.txt

1 line

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.amdgcn.fdot2.ll

2 lines

llvm.amdgcn.image.gather4.dim.ll

2 lines

llvm.amdgcn.image.load.1d.d16.ll

1 line

llvm.amdgcn.image.sample.g16.ll

2 lines

llvm.amdgcn.interp.inreg.ll

9 lines

llvm.amdgcn.intersect_ray.ll

14 lines

2 lines

2 lines

2 lines

2 lines

atomic_optimizations_local_pointer.ll

231 lines

cluster_stores.ll

2 lines

dual-source-blend-export.ll

7 lines

15 lines

14 lines

561 lines

4 lines

llvm.amdgcn.exp.row.ll

1 line

llvm.amdgcn.fdot2.f32.bf16.ll

2 lines

llvm.amdgcn.image.dim.ll

2 lines

llvm.amdgcn.image.gather4.a16.dim.ll

2 lines

llvm.amdgcn.image.sample.a16.dim.ll

2 lines

llvm.amdgcn.image.sample.d16.dim.ll

2 lines

llvm.amdgcn.image.sample.dim.ll

2 lines

llvm.amdgcn.image.sample.g16.encode.ll

8 lines

llvm.amdgcn.image.sample.g16.ll

2 lines

llvm.amdgcn.interp.inreg.ll

9 lines

llvm.amdgcn.intersect_ray.ll

4 lines

llvm.amdgcn.permlane64.ll

2 lines

llvm.mulo.ll

23 lines

mad_64_32.ll

30 lines

mad_u64_u32.ll

4 lines

memory-legalizer-flat-nontemporal.ll

4 lines

memory-legalizer-flat-volatile.ll

4 lines

Diff 441061

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines
	extern char &SIAnnotateControlFlowPassID;			extern char &SIAnnotateControlFlowPassID;

	void initializeSIMemoryLegalizerPass(PassRegistry&);			void initializeSIMemoryLegalizerPass(PassRegistry&);
	extern char &SIMemoryLegalizerID;			extern char &SIMemoryLegalizerID;

	void initializeSIModeRegisterPass(PassRegistry&);			void initializeSIModeRegisterPass(PassRegistry&);
	extern char &SIModeRegisterID;			extern char &SIModeRegisterID;

				void initializeAMDGPUInsertDelayAluPass(PassRegistry &);
				extern char &AMDGPUInsertDelayAluID;

	void initializeSIInsertHardClausesPass(PassRegistry &);			void initializeSIInsertHardClausesPass(PassRegistry &);
	extern char &SIInsertHardClausesID;			extern char &SIInsertHardClausesID;

	void initializeSIInsertWaitcntsPass(PassRegistry&);			void initializeSIInsertWaitcntsPass(PassRegistry&);
	extern char &SIInsertWaitcntsID;			extern char &SIInsertWaitcntsID;

	void initializeSIFormMemoryClausesPass(PassRegistry&);			void initializeSIFormMemoryClausesPass(PassRegistry&);
	extern char &SIFormMemoryClausesID;			extern char &SIFormMemoryClausesID;
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp

This file was added.

				//===- AMDGPUInsertDelayAlu.cpp - Insert s_delay_alu instructions ---------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// Insert s_delay_alu instructions to avoid stalls on GFX11+.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"
				#include "GCNSubtarget.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
				#include "SIInstrInfo.h"
				#include "llvm/ADT/SetVector.h"

				using namespace llvm;

				#define DEBUG_TYPE "amdgpu-insert-delay-alu"

				namespace {

				class AMDGPUInsertDelayAlu : public MachineFunctionPass {
				public:
				static char ID;

				const SIInstrInfo *SII;
				const TargetRegisterInfo *TRI;

				TargetSchedModel SchedModel;

				AMDGPUInsertDelayAlu() : MachineFunctionPass(ID) {}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				// Return true if MI waits for all outstanding VALU instructions to complete.
				static bool instructionWaitsForVALU(const MachineInstr &MI) {
				// These instruction types wait for VA_VDST==0 before issuing.
				const uint64_t VA_VDST_0 = SIInstrFlags::DS \| SIInstrFlags::EXP \|
				SIInstrFlags::FLAT \| SIInstrFlags::MIMG \|
				SIInstrFlags::MTBUF \| SIInstrFlags::MUBUF;
				if (MI.getDesc().TSFlags & VA_VDST_0)
				return true;
				if (MI.getOpcode() == AMDGPU::S_SENDMSG_RTN_B32 \|\|
				MI.getOpcode() == AMDGPU::S_SENDMSG_RTN_B64)
				return true;
				if (MI.getOpcode() == AMDGPU::S_WAITCNT_DEPCTR &&
				(MI.getOperand(0).getImm() & 0xf000) == 0)
				return true;
				return false;
				}

				// Types of delay that can be encoded in an s_delay_alu instruction.
				enum DelayType { VALU, TRANS, SALU, OTHER };

				// Get the delay type for an instruction with the specified TSFlags.
				static DelayType getDelayType(uint64_t TSFlags) {
				if (TSFlags & SIInstrFlags::TRANS)
				return TRANS;
				if (TSFlags & SIInstrFlags::VALU)
				return VALU;
				if (TSFlags & SIInstrFlags::SALU)
				return SALU;
				return OTHER;
				}

				// Information about the last instruction(s) that wrote to a particular
				// regunit. In straight-line code there will only be one such instruction, but
				// when control flow converges we merge the delay information from each path
				// to represent the union of the worst-case delays of each type.
				struct DelayInfo {
				// One larger than the maximum number of (non-TRANS) VALU instructions we
				// can encode in an s_delay_alu instruction.
				static const unsigned VALU_MAX = 5;

				// One larger than the maximum number of TRANS instructions we can encode in
				// an s_delay_alu instruction.
				static const unsigned TRANS_MAX = 4;

				// If it was written by a (non-TRANS) VALU, remember how many clock cycles
				// are left until it completes, and how many other (non-TRANS) VALU we have
				// seen since it was issued.
				uint8_t VALUCycles = 0;
				uint8_t VALUNum = VALU_MAX;

				// If it was written by a TRANS, remember how many clock cycles are left
				// until it completes, and how many other TRANS we have seen since it was
				// issued.
				uint8_t TRANSCycles = 0;
				uint8_t TRANSNum = TRANS_MAX;
				// Also remember how many other (non-TRANS) VALU we have seen since it was
				// issued. When an instruction depends on both a prior TRANS and a prior
				// non-TRANS VALU, this is used to decide whether to encode a wait for just
				// one or both of them.
				uint8_t TRANSNumVALU = VALU_MAX;

				// If it was written by an SALU, remember how many clock cycles are left
				// until it completes.
				uint8_t SALUCycles = 0;

				DelayInfo() = default;

				DelayInfo(DelayType Type, unsigned Cycles) {
				switch (Type) {
				default:
				llvm_unreachable("unexpected type");
				case VALU:
				VALUCycles = Cycles;
				VALUNum = 0;
				break;
				case TRANS:
				TRANSCycles = Cycles;
				TRANSNum = 0;
				TRANSNumVALU = 0;
				break;
				case SALU:
				SALUCycles = Cycles;
				break;
				}
				}

				bool operator==(const DelayInfo &RHS) const {
				return VALUCycles == RHS.VALUCycles && VALUNum == RHS.VALUNum &&
				TRANSCycles == RHS.TRANSCycles && TRANSNum == RHS.TRANSNum &&
				TRANSNumVALU == RHS.TRANSNumVALU && SALUCycles == RHS.SALUCycles;
				}

				bool operator!=(const DelayInfo &RHS) const { return !(*this == RHS); }

				// Merge another DelayInfo into this one, to represent the union of the
				// worst-case delays of each type.
				void merge(const DelayInfo &RHS) {
				VALUCycles = std::max(VALUCycles, RHS.VALUCycles);
				VALUNum = std::min(VALUNum, RHS.VALUNum);
				TRANSCycles = std::max(TRANSCycles, RHS.TRANSCycles);
				TRANSNum = std::min(TRANSNum, RHS.TRANSNum);
				TRANSNumVALU = std::min(TRANSNumVALU, RHS.TRANSNumVALU);
				SALUCycles = std::max(SALUCycles, RHS.SALUCycles);
				}

				// Update this DelayInfo after issuing an instruction. IsVALU should be 1
				// when issuing a (non-TRANS) VALU, else 0. IsTRANS should be 1 when issuing
				// a TRANS, else 0. Cycles is the number of cycles it takes to issue the
				// instruction. Return true if there is no longer any useful delay info.
				bool advance(DelayType Type, unsigned Cycles) {
				bool Erase = true;

				VALUNum += (Type == VALU);
				if (VALUNum >= VALU_MAX \|\| VALUCycles <= Cycles) {
				// Forget about the VALU instruction. It was too far back or has
				// definitely completed by now.
				VALUNum = VALU_MAX;
				VALUCycles = 0;
				} else {
				VALUCycles -= Cycles;
				Erase = false;
				}
				arsenmUnsubmitted Done Reply Inline Actions Should check LLVM_ENABLE_DUMP instead arsenm: Should check LLVM_ENABLE_DUMP instead

				TRANSNum += (Type == TRANS);
				TRANSNumVALU += (Type == VALU);
				if (TRANSNum >= TRANS_MAX \|\| TRANSCycles <= Cycles) {
				// Forget about any TRANS instruction. It was too far back or has
				// definitely completed by now.
				TRANSNum = TRANS_MAX;
				TRANSNumVALU = VALU_MAX;
				TRANSCycles = 0;
				} else {
				TRANSCycles -= Cycles;
				Erase = false;
				}

				if (SALUCycles <= Cycles) {
				// Forget about any SALU instruction. It has definitely completed by
				// now.
				SALUCycles = 0;
				} else {
				SALUCycles -= Cycles;
				Erase = false;
				}

				return Erase;
				}

				#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
				void dump() const {
				if (VALUCycles)
				dbgs() << " VALUCycles=" << (int)VALUCycles;
				if (VALUNum < VALU_MAX)
				dbgs() << " VALUNum=" << (int)VALUNum;
				if (TRANSCycles)
				dbgs() << " TRANSCycles=" << (int)TRANSCycles;
				if (TRANSNum < TRANS_MAX)
				dbgs() << " TRANSNum=" << (int)TRANSNum;
				if (TRANSNumVALU < VALU_MAX)
				dbgs() << " TRANSNumVALU=" << (int)TRANSNumVALU;
				if (SALUCycles)
				dbgs() << " SALUCycles=" << (int)SALUCycles;
				}
				#endif
				};

				// A map from regunits to the delay info for that regunit.
				struct DelayState : DenseMap<unsigned, DelayInfo> {
				// Merge another DelayState into this one by merging the delay info for each
				// regunit.
				void merge(const DelayState &RHS) {
				for (const auto &KV : RHS) {
				iterator It;
				bool Inserted;
				std::tie(It, Inserted) = insert(KV);
				if (!Inserted)
				It->second.merge(KV.second);
				arsenmUnsubmitted Done Reply Inline Actions Bad autos, I'm not even sure what the type is supposed to be here arsenm: Bad autos, I'm not even sure what the type is supposed to be here
				}
				}

				// Advance the delay info for each regunit, erasing any that are no longer
				// useful.
				void advance(DelayType Type, unsigned Cycles) {
				iterator Next;
				for (auto I = begin(), E = end(); I != E; I = Next) {
				Next = std::next(I);
				if (I->second.advance(Type, Cycles))
				erase(I);
				}
				}

				#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
				void dump(const TargetRegisterInfo *TRI) const {
				if (empty()) {
				dbgs() << " empty\n";
				return;
				}

				// Dump DelayInfo for each RegUnit in numerical order.
				SmallVector<const_iterator, 8> Order;
				Order.reserve(size());
				for (const_iterator I = begin(), E = end(); I != E; ++I)
				Order.push_back(I);
				llvm::sort(Order, [](const const_iterator &A, const const_iterator &B) {
				return A->first < B->first;
				});
				for (const_iterator I : Order) {
				dbgs() << " " << printRegUnit(I->first, TRI);
				I->second.dump();
				dbgs() << "\n";
				}
				}
				#endif
				};

				// The saved delay state at the end of each basic block.
				DenseMap<MachineBasicBlock *, DelayState> BlockState;

				// Emit an s_delay_alu instruction if necessary before MI.
				MachineInstr *emitDelayAlu(MachineInstr &MI, DelayInfo Delay,
				MachineInstr *LastDelayAlu) {
				unsigned Imm = 0;

				// Wait for a TRANS instruction.
				if (Delay.TRANSNum < DelayInfo::TRANS_MAX)
				Imm \|= 4 + Delay.TRANSNum;

				// Wait for a VALU instruction (if it's more recent than any TRANS
				// instruction that we're also waiting for).
				if (Delay.VALUNum < DelayInfo::VALU_MAX &&
				Delay.VALUNum <= Delay.TRANSNumVALU) {
				if (Imm & 0xf)
				Imm \|= Delay.VALUNum << 7;
				else
				Imm \|= Delay.VALUNum;
				}

				// Wait for an SALU instruction.
				if (Delay.SALUCycles) {
				if (Imm & 0x780) {
				// We have already encoded a VALU and a TRANS delay. There's no room in
				// the encoding for an SALU delay as well, so just drop it.
				} else if (Imm & 0xf) {
				Imm \|= (Delay.SALUCycles + 8) << 7;
				} else {
				Imm \|= Delay.SALUCycles + 8;
				}
				}

				// Don't emit the s_delay_alu instruction if there's nothing to wait for.
				if (!Imm)
				return LastDelayAlu;

				// If we only need to wait for one instruction, try encoding it in the last
				// s_delay_alu that we emitted.
				if (!(Imm & 0x780) && LastDelayAlu) {
				unsigned Skip = 0;
				for (auto I = MachineBasicBlock::instr_iterator(LastDelayAlu),
				E = MachineBasicBlock::instr_iterator(MI);
				++I != E;) {
				if (!I->isBundle() && !I->isMetaInstruction())
				++Skip;
				arsenmUnsubmitted Done Reply Inline Actions Can merge these into one LLVM_DEBUG. Also should use printMBBReference arsenm: Can merge these into one LLVM_DEBUG. Also should use printMBBReference
				foadAuthorUnsubmitted Done Reply Inline Actions Done, but the single LLVM_DEBUG looks uglier to me. foad: Done, but the single LLVM_DEBUG looks uglier to me.
				}
				if (Skip < 6) {
				MachineOperand &Op = LastDelayAlu->getOperand(0);
				unsigned LastImm = Op.getImm();
				assert((LastImm & ~0xf) == 0 &&
				"Remembered an s_delay_alu with no room for another delay!");
				LastImm \|= Imm << 7 \| Skip << 4;
				Op.setImm(LastImm);
				return nullptr;
				}
				}

				auto &MBB = *MI.getParent();
				MachineInstr *DelayAlu =
				BuildMI(MBB, MI, DebugLoc(), SII->get(AMDGPU::S_DELAY_ALU)).addImm(Imm);
				// Remember the s_delay_alu for next time if there is still room in it to
				// encode another delay.
				return (Imm & 0x780) ? nullptr : DelayAlu;
				arsenmUnsubmitted Done Reply Inline Actions This is D128313 (except for SI_RETURN_TO_EPILOG which I'm not sure about since it's a bit weird) arsenm: This is D128313 (except for SI_RETURN_TO_EPILOG which I'm not sure about since it's a bit weird)
				foadAuthorUnsubmitted Done Reply Inline Actions Can update this when that patch lands. foad: Can update this when that patch lands.
				}

				bool runOnMachineBasicBlock(MachineBasicBlock &MBB, bool Emit) {
				DelayState State;
				for (auto *Pred : MBB.predecessors())
				State.merge(BlockState[Pred]);

				LLVM_DEBUG(dbgs() << " State at start of " << printMBBReference(MBB)
				arsenmUnsubmitted Done Reply Inline Actions Should extract this into a separate predicate function arsenm: Should extract this into a separate predicate function
				<< "\n";
				State.dump(TRI););

				bool Changed = false;
				arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
				MachineInstr *LastDelayAlu = nullptr;

				// Iterate over the contents of bundles, but don't emit any instructions
				// inside a bundle.
				for (auto &MI : MBB.instrs()) {
				if (MI.isBundle() \|\| MI.isMetaInstruction())
				continue;

				// Ignore some more instructions that do not generate any code.
				switch (MI.getOpcode()) {
				case AMDGPU::SI_RETURN_TO_EPILOG:
				continue;
				}

				DelayType Type = getDelayType(MI.getDesc().TSFlags);

				arsenmUnsubmitted Not Done Reply Inline Actions Why does the opcode need special casing here? Why not every tied operand? arsenm: Why does the opcode need special casing here? Why not every tied operand?
				foadAuthorUnsubmitted Done Reply Inline Actions Because the hardware does not see a RAW dependency here: v_mov_b32 v0, 0 v_writelane_b32 v0, s0, 0 At the thread level there is no dependency. The MIR instruction uses a tied read-write operand for v0, which I suppose is a wave-level representation where the read represents the lanes that are not modified. Other instructions with tied operands (like V_MAC) represent a real read of a VGPR in all active lanes, so we do want to model a delay for them. foad: Because the hardware does not see a RAW dependency here: ``` v_mov_b32 v0, 0…
				if (instructionWaitsForVALU(MI)) {
				// Forget about all outstanding VALU delays.
				State = DelayState();
				} else if (Type != OTHER) {
				DelayInfo Delay;
				// TODO: Scan implicit uses too?
				for (const auto &Op : MI.explicit_uses()) {
				if (Op.isReg()) {
				// One of the operands of the writelane is also the output operand.
				// This creates the insertion of redundant delays. Hence, we have to
				// ignore this operand.
				if (MI.getOpcode() == AMDGPU::V_WRITELANE_B32 && Op.isTied())
				continue;
				for (MCRegUnitIterator UI(Op.getReg(), TRI); UI.isValid(); ++UI) {
				auto It = State.find(*UI);
				if (It != State.end()) {
				Delay.merge(It->second);
				State.erase(*UI);
				}
				}
				}
				}
				if (Emit && !MI.isBundledWithPred()) {
				// TODO: For VALU->SALU delays should we use s_delay_alu or s_nop or
				// just ignore them?
				LastDelayAlu = emitDelayAlu(MI, Delay, LastDelayAlu);
				}
				}

				if (Type != OTHER) {
				// TODO: Scan implicit defs too?
				for (const auto &Op : MI.defs()) {
				unsigned Latency = SchedModel.computeOperandLatency(
				&MI, MI.getOperandNo(&Op), nullptr, 0);
				for (MCRegUnitIterator UI(Op.getReg(), TRI); UI.isValid(); ++UI)
				State[*UI] = DelayInfo(Type, Latency);
				}
				}
				arsenmUnsubmitted Done Reply Inline Actions One LLVM_DEBUG arsenm: One LLVM_DEBUG

				// Advance by the number of cycles it takes to issue this instruction.
				// TODO: Use a more advanced model that accounts for instructions that
				// take multiple cycles to issue on a particular pipeline.
				unsigned Cycles = SIInstrInfo::getNumWaitStates(MI);
				// TODO: In wave64 mode, double the number of cycles for VALU and VMEM
				// instructions on the assumption that they will usually have to be issued
				// twice?
				State.advance(Type, Cycles);

				LLVM_DEBUG(dbgs() << " State after " << MI; State.dump(TRI););
				}

				if (Emit) {
				assert(State == BlockState[&MBB] &&
				"Basic block state should not have changed on final pass!");
				} else if (State != BlockState[&MBB]) {
				BlockState[&MBB] = std::move(State);
				Changed = true;
				}
				return Changed;
				}

				bool runOnMachineFunction(MachineFunction &MF) override {
				if (skipFunction(MF.getFunction()))
				return false;

				LLVM_DEBUG(dbgs() << "AMDGPUInsertDelayAlu running on " << MF.getName()
				<< "\n");

				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				if (!ST.hasDelayAlu())
				return false;

				SII = ST.getInstrInfo();
				TRI = ST.getRegisterInfo();

				SchedModel.init(&ST);

				// Calculate the delay state for each basic block, iterating until we reach
				// a fixed point.
				SetVector<MachineBasicBlock *> WorkList;
				for (auto &MBB : reverse(MF))
				WorkList.insert(&MBB);
				while (!WorkList.empty()) {
				auto &MBB = *WorkList.pop_back_val();
				bool Changed = runOnMachineBasicBlock(MBB, false);
				if (Changed)
				WorkList.insert(MBB.succ_begin(), MBB.succ_end());
				}

				LLVM_DEBUG(dbgs() << "Final pass over all BBs\n");

				// Make one last pass over all basic blocks to emit s_delay_alu
				// instructions.
				bool Changed = false;
				for (auto &MBB : MF)
				Changed \|= runOnMachineBasicBlock(MBB, true);
				return Changed;
				}
				};

				} // namespace

				char AMDGPUInsertDelayAlu::ID = 0;

				char &llvm::AMDGPUInsertDelayAluID = AMDGPUInsertDelayAlu::ID;

				INITIALIZE_PASS(AMDGPUInsertDelayAlu, DEBUG_TYPE, "AMDGPU Insert Delay ALU",
				false, false)

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 266 Lines • ▼ Show 20 Lines

// Enable Mode register optimization		// Enable Mode register optimization
static cl::opt<bool> EnableSIModeRegisterPass(		static cl::opt<bool> EnableSIModeRegisterPass(
"amdgpu-mode-register",		"amdgpu-mode-register",
cl::desc("Enable mode register pass"),		cl::desc("Enable mode register pass"),
cl::init(true),		cl::init(true),
cl::Hidden);		cl::Hidden);

		// Enable GFX11+ s_delay_alu insertion
		static cl::opt<bool>
		EnableInsertDelayAlu("amdgpu-enable-delay-alu",
		cl::desc("Enable s_delay_alu insertion"),
		cl::init(true), cl::Hidden);

// Option is used in lit tests to prevent deadcoding of patterns inspected.		// Option is used in lit tests to prevent deadcoding of patterns inspected.
static cl::opt<bool>		static cl::opt<bool>
EnableDCEInRA("amdgpu-dce-in-ra",		EnableDCEInRA("amdgpu-dce-in-ra",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Enable machine DCE inside regalloc"));		cl::desc("Enable machine DCE inside regalloc"));

static cl::opt<bool> EnableSetWavePriority("amdgpu-set-wave-priority",		static cl::opt<bool> EnableSetWavePriority("amdgpu-set-wave-priority",
cl::desc("Adjust wave priority"),		cl::desc("Adjust wave priority"),
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeAMDGPULateCodeGenPreparePass(*PR);		initializeAMDGPULateCodeGenPreparePass(*PR);
initializeAMDGPUPropagateAttributesEarlyPass(*PR);		initializeAMDGPUPropagateAttributesEarlyPass(*PR);
initializeAMDGPUPropagateAttributesLatePass(*PR);		initializeAMDGPUPropagateAttributesLatePass(*PR);
initializeAMDGPUReplaceLDSUseWithPointerPass(*PR);		initializeAMDGPUReplaceLDSUseWithPointerPass(*PR);
initializeAMDGPULowerModuleLDSPass(*PR);		initializeAMDGPULowerModuleLDSPass(*PR);
initializeAMDGPURewriteOutArgumentsPass(*PR);		initializeAMDGPURewriteOutArgumentsPass(*PR);
initializeAMDGPUUnifyMetadataPass(*PR);		initializeAMDGPUUnifyMetadataPass(*PR);
initializeSIAnnotateControlFlowPass(*PR);		initializeSIAnnotateControlFlowPass(*PR);
		initializeAMDGPUInsertDelayAluPass(*PR);
initializeSIInsertHardClausesPass(*PR);		initializeSIInsertHardClausesPass(*PR);
initializeSIInsertWaitcntsPass(*PR);		initializeSIInsertWaitcntsPass(*PR);
initializeSIModeRegisterPass(*PR);		initializeSIModeRegisterPass(*PR);
initializeSIWholeQuadModePass(*PR);		initializeSIWholeQuadModePass(*PR);
initializeSILowerControlFlowPass(*PR);		initializeSILowerControlFlowPass(*PR);
initializeSIPreEmitPeepholePass(*PR);		initializeSIPreEmitPeepholePass(*PR);
initializeSILateBranchLoweringPass(*PR);		initializeSILateBranchLoweringPass(*PR);
initializeSIMemoryLegalizerPass(*PR);		initializeSIMemoryLegalizerPass(*PR);
▲ Show 20 Lines • Show All 1,034 Lines • ▼ Show 20 Lines	void GCNPassConfig::addPreEmitPass() {
// guarantee to be able handle all hazards correctly. This is because if there		// guarantee to be able handle all hazards correctly. This is because if there
// are multiple scheduling regions in a basic block, the regions are scheduled		// are multiple scheduling regions in a basic block, the regions are scheduled
// bottom up, so when we begin to schedule a region we don't know what		// bottom up, so when we begin to schedule a region we don't know what
// instructions were emitted directly before it.		// instructions were emitted directly before it.
//		//
// Here we add a stand-alone hazard recognizer pass which can handle all		// Here we add a stand-alone hazard recognizer pass which can handle all
// cases.		// cases.
addPass(&PostRAHazardRecognizerID);		addPass(&PostRAHazardRecognizerID);

		if (isPassEnabled(EnableInsertDelayAlu, CodeGenOpt::Less))
		addPass(&AMDGPUInsertDelayAluID);

addPass(&BranchRelaxationPassID);		addPass(&BranchRelaxationPassID);
}		}

TargetPassConfig *GCNTargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *GCNTargetMachine::createPassConfig(PassManagerBase &PM) {
return new GCNPassConfig(*this, PM);		return new GCNPassConfig(*this, PM);
}		}

yaml::MachineFunctionInfo *GCNTargetMachine::createDefaultFuncInfoYAML() const {		yaml::MachineFunctionInfo *GCNTargetMachine::createDefaultFuncInfoYAML() const {
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
AMDGPUCallLowering.cpp		AMDGPUCallLowering.cpp
AMDGPUCodeGenPrepare.cpp		AMDGPUCodeGenPrepare.cpp
AMDGPUCombinerHelper.cpp		AMDGPUCombinerHelper.cpp
AMDGPUCtorDtorLowering.cpp		AMDGPUCtorDtorLowering.cpp
AMDGPUExportClustering.cpp		AMDGPUExportClustering.cpp
AMDGPUFrameLowering.cpp		AMDGPUFrameLowering.cpp
AMDGPUGlobalISelUtils.cpp		AMDGPUGlobalISelUtils.cpp
AMDGPUHSAMetadataStreamer.cpp		AMDGPUHSAMetadataStreamer.cpp
		AMDGPUInsertDelayAlu.cpp
AMDGPUInstCombineIntrinsic.cpp		AMDGPUInstCombineIntrinsic.cpp
AMDGPUInstrInfo.cpp		AMDGPUInstrInfo.cpp
AMDGPUInstructionSelector.cpp		AMDGPUInstructionSelector.cpp
AMDGPUISelDAGToDAG.cpp		AMDGPUISelDAGToDAG.cpp
AMDGPUISelLowering.cpp		AMDGPUISelLowering.cpp
AMDGPULateCodeGenPrepare.cpp		AMDGPULateCodeGenPrepare.cpp
AMDGPULegalizerInfo.cpp		AMDGPULegalizerInfo.cpp
AMDGPULibCalls.cpp		AMDGPULibCalls.cpp
▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX906 %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX906 %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10PLUS %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10PLUS %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10PLUS %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10PLUS %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10PLUS %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10PLUS %s

	define float @v_fdot2(<2 x half> %a, <2 x half> %b, float %c) {			define float @v_fdot2(<2 x half> %a, <2 x half> %b, float %c) {
	; GFX906-LABEL: v_fdot2:			; GFX906-LABEL: v_fdot2:
	; GFX906: ; %bb.0:			; GFX906: ; %bb.0:
	; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX906-NEXT: v_dot2_f32_f16 v0, v0, v1, v2			; GFX906-NEXT: v_dot2_f32_f16 v0, v0, v1, v2
	; GFX906-NEXT: s_setpc_b64 s[30:31]			; GFX906-NEXT: s_setpc_b64 s[30:31]
	;			;
	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s			; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s
	; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -o - %s \| FileCheck -check-prefix=GFX10NSA %s			; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -o - %s \| FileCheck -check-prefix=GFX10NSA %s
	; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1100 -o - %s \| FileCheck -check-prefix=GFX10NSA %s			; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -o - %s \| FileCheck -check-prefix=GFX10NSA %s

	define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {			define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
	; GFX6-LABEL: gather4_2d:			; GFX6-LABEL: gather4_2d:
	; GFX6: ; %bb.0: ; %main_body			; GFX6: ; %bb.0: ; %main_body
	; GFX6-NEXT: s_mov_b64 s[14:15], exec			; GFX6-NEXT: s_mov_b64 s[14:15], exec
	; GFX6-NEXT: s_mov_b32 s0, s2			; GFX6-NEXT: s_mov_b32 s0, s2
	; GFX6-NEXT: s_mov_b32 s1, s3			; GFX6-NEXT: s_mov_b32 s1, s3
	; GFX6-NEXT: s_mov_b32 s2, s4			; GFX6-NEXT: s_mov_b32 s2, s4
	▲ Show 20 Lines • Show All 847 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll

	Show First 20 Lines • Show All 583 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: s_mov_b32 s5, s7			; GFX11-NEXT: s_mov_b32 s5, s7
	; GFX11-NEXT: s_mov_b32 s6, s8			; GFX11-NEXT: s_mov_b32 s6, s8
	; GFX11-NEXT: s_mov_b32 s7, s9			; GFX11-NEXT: s_mov_b32 s7, s9
	; GFX11-NEXT: image_load v[0:1], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_1D unorm d16			; GFX11-NEXT: image_load v[0:1], v0, s[0:7] dmask:0x7 dim:SQ_RSRC_IMG_1D unorm d16
	; GFX11-NEXT: s_lshl_b32 s0, s0, 16			; GFX11-NEXT: s_lshl_b32 s0, s0, 16
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_lshrrev_b32_e32 v2, 16, v0			; GFX11-NEXT: v_lshrrev_b32_e32 v2, 16, v0
	; GFX11-NEXT: v_and_or_b32 v1, 0xffff, v1, s0			; GFX11-NEXT: v_and_or_b32 v1, 0xffff, v1, s0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_lshlrev_b32_e32 v2, 16, v2			; GFX11-NEXT: v_lshlrev_b32_e32 v2, 16, v2
	; GFX11-NEXT: v_and_or_b32 v0, 0xffff, v0, v2			; GFX11-NEXT: v_and_or_b32 v0, 0xffff, v0, v2
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	%v = call <3 x half> @llvm.amdgcn.image.load.1d.v3f16.i32(i32 7, i32 %s, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <3 x half> @llvm.amdgcn.image.load.1d.v3f16.i32(i32 7, i32 %s, <8 x i32> %rsrc, i32 0, i32 0)
	ret <3 x half> %v			ret <3 x half> %v
	}			}

	define amdgpu_ps <4 x half> @load_1d_v4f16_xyzw(<8 x i32> inreg %rsrc, i32 %s) {			define amdgpu_ps <4 x half> @load_1d_v4f16_xyzw(<8 x i32> inreg %rsrc, i32 %s) {
	▲ Show 20 Lines • Show All 378 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.sample.g16.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s) {			define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s) {
	; GFX10-LABEL: sample_d_1d:			; GFX10-LABEL: sample_d_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_lshl_b32 s12, s0, 16			; GFX10-NEXT: s_lshl_b32 s12, s0, 16
	; GFX10-NEXT: v_and_or_b32 v0, 0xffff, v0, s12			; GFX10-NEXT: v_and_or_b32 v0, 0xffff, v0, s12
	; GFX10-NEXT: v_and_or_b32 v1, 0xffff, v1, s12			; GFX10-NEXT: v_and_or_b32 v1, 0xffff, v1, s12
	; GFX10-NEXT: image_sample_d_g16 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D			; GFX10-NEXT: image_sample_d_g16 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.interp.inreg.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	define amdgpu_ps void @v_interp_f32(float inreg %i, float inreg %j, i32 inreg %m0) #0 {			define amdgpu_ps void @v_interp_f32(float inreg %i, float inreg %j, i32 inreg %m0) #0 {
	; GCN-LABEL: v_interp_f32:			; GCN-LABEL: v_interp_f32:
	; GCN: ; %bb.0: ; %main_body			; GCN: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b32 s3, exec_lo			; GCN-NEXT: s_mov_b32 s3, exec_lo
	; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo			; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GCN-NEXT: s_mov_b32 m0, s2			; GCN-NEXT: s_mov_b32 m0, s2
	; GCN-NEXT: lds_param_load v0, attr0.y wait_vdst:15			; GCN-NEXT: lds_param_load v0, attr0.y wait_vdst:15
	; GCN-NEXT: lds_param_load v1, attr1.x wait_vdst:15			; GCN-NEXT: lds_param_load v1, attr1.x wait_vdst:15
	; GCN-NEXT: s_mov_b32 exec_lo, s3			; GCN-NEXT: s_mov_b32 exec_lo, s3
	; GCN-NEXT: v_mov_b32_e32 v2, s0			; GCN-NEXT: v_mov_b32_e32 v2, s0
	; GCN-NEXT: v_mov_b32_e32 v4, s1			; GCN-NEXT: v_mov_b32_e32 v4, s1
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GCN-NEXT: v_interp_p10_f32 v3, v0, v2, v0 wait_exp:1			; GCN-NEXT: v_interp_p10_f32 v3, v0, v2, v0 wait_exp:1
	; GCN-NEXT: v_interp_p10_f32 v2, v1, v2, v1			; GCN-NEXT: v_interp_p10_f32 v2, v1, v2, v1
	; GCN-NEXT: v_interp_p2_f32 v5, v0, v4, v3 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v5, v0, v4, v3 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GCN-NEXT: v_interp_p2_f32 v4, v1, v4, v5 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v4, v1, v4, v5 wait_exp:7
	; GCN-NEXT: exp mrt0 v3, v2, v5, v4 done			; GCN-NEXT: exp mrt0 v3, v2, v5, v4 done
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	main_body:			main_body:
	%p0 = call float @llvm.amdgcn.lds.param.load(i32 1, i32 0, i32 %m0)			%p0 = call float @llvm.amdgcn.lds.param.load(i32 1, i32 0, i32 %m0)
	%p1 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %m0)			%p1 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %m0)
	%p0_0 = call float @llvm.amdgcn.interp.inreg.p10(float %p0, float %i, float %p0)			%p0_0 = call float @llvm.amdgcn.interp.inreg.p10(float %p0, float %i, float %p0)
	%p1_0 = call float @llvm.amdgcn.interp.inreg.p2(float %p0, float %j, float %p0_0)			%p1_0 = call float @llvm.amdgcn.interp.inreg.p2(float %p0, float %j, float %p0_0)
	Show All 11 Lines
	; GCN-NEXT: s_mov_b32 m0, s2			; GCN-NEXT: s_mov_b32 m0, s2
	; GCN-NEXT: lds_param_load v0, attr0.x wait_vdst:15			; GCN-NEXT: lds_param_load v0, attr0.x wait_vdst:15
	; GCN-NEXT: lds_param_load v1, attr1.x wait_vdst:15			; GCN-NEXT: lds_param_load v1, attr1.x wait_vdst:15
	; GCN-NEXT: lds_param_load v2, attr2.x wait_vdst:15			; GCN-NEXT: lds_param_load v2, attr2.x wait_vdst:15
	; GCN-NEXT: lds_param_load v3, attr3.x wait_vdst:15			; GCN-NEXT: lds_param_load v3, attr3.x wait_vdst:15
	; GCN-NEXT: s_mov_b32 exec_lo, s3			; GCN-NEXT: s_mov_b32 exec_lo, s3
	; GCN-NEXT: v_mov_b32_e32 v4, s0			; GCN-NEXT: v_mov_b32_e32 v4, s0
	; GCN-NEXT: v_mov_b32_e32 v5, s1			; GCN-NEXT: v_mov_b32_e32 v5, s1
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p10_f32 v6, v0, v4, v0 wait_exp:3			; GCN-NEXT: v_interp_p10_f32 v6, v0, v4, v0 wait_exp:3
	; GCN-NEXT: v_interp_p10_f32 v7, v1, v4, v1 wait_exp:2			; GCN-NEXT: v_interp_p10_f32 v7, v1, v4, v1 wait_exp:2
	; GCN-NEXT: v_interp_p10_f32 v8, v2, v4, v2 wait_exp:1			; GCN-NEXT: v_interp_p10_f32 v8, v2, v4, v2 wait_exp:1
	; GCN-NEXT: v_interp_p10_f32 v4, v3, v4, v3			; GCN-NEXT: v_interp_p10_f32 v4, v3, v4, v3
	; GCN-NEXT: v_interp_p2_f32 v6, v0, v5, v6 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v6, v0, v5, v6 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v7, v1, v5, v7 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v7, v1, v5, v7 wait_exp:7
	; GCN-NEXT: v_interp_p2_f32 v8, v2, v5, v8 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v8, v2, v5, v8 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v4, v3, v5, v4 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v4, v3, v5, v4 wait_exp:7
	; GCN-NEXT: exp mrt0 v6, v7, v8, v4 done			; GCN-NEXT: exp mrt0 v6, v7, v8, v4 done
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	main_body:			main_body:
	%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %m0)			%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %m0)
	%p1 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %m0)			%p1 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %m0)
	%p2 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 2, i32 %m0)			%p2 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 2, i32 %m0)
	%p3 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 3, i32 %m0)			%p3 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 3, i32 %m0)
	Show All 21 Lines
	; GCN-NEXT: lds_param_load v4, attr2.x wait_vdst:15			; GCN-NEXT: lds_param_load v4, attr2.x wait_vdst:15
	; GCN-NEXT: lds_param_load v5, attr3.x wait_vdst:15			; GCN-NEXT: lds_param_load v5, attr3.x wait_vdst:15
	; GCN-NEXT: s_mov_b32 exec_lo, s0			; GCN-NEXT: s_mov_b32 exec_lo, s0
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_interp_p10_f32 v6, v2, v0, v2 wait_exp:3			; GCN-NEXT: v_interp_p10_f32 v6, v2, v0, v2 wait_exp:3
	; GCN-NEXT: v_interp_p10_f32 v7, v3, v0, v3 wait_exp:2			; GCN-NEXT: v_interp_p10_f32 v7, v3, v0, v3 wait_exp:2
	; GCN-NEXT: v_interp_p10_f32 v8, v4, v0, v4 wait_exp:1			; GCN-NEXT: v_interp_p10_f32 v8, v4, v0, v4 wait_exp:1
	; GCN-NEXT: v_interp_p10_f32 v0, v5, v0, v5			; GCN-NEXT: v_interp_p10_f32 v0, v5, v0, v5
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v6, v2, v1, v6 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v6, v2, v1, v6 wait_exp:7
	; GCN-NEXT: v_interp_p2_f32 v7, v3, v1, v7 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v7, v3, v1, v7 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v8, v4, v1, v8 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v8, v4, v1, v8 wait_exp:7
	; GCN-NEXT: v_interp_p2_f32 v0, v5, v1, v0 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v0, v5, v1, v0 wait_exp:7
	; GCN-NEXT: exp mrt0 v6, v7, v8, v0 done			; GCN-NEXT: exp mrt0 v6, v7, v8, v0 done
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	main_body:			main_body:
	%i.ptr = getelementptr float, float addrspace(1)* %ptr, i32 1			%i.ptr = getelementptr float, float addrspace(1)* %ptr, i32 1
	%i = load float, float addrspace(1)* %i.ptr, align 4			%i = load float, float addrspace(1)* %i.ptr, align 4
	%j.ptr = getelementptr float, float addrspace(1)* %ptr, i32 2			%j.ptr = getelementptr float, float addrspace(1)* %ptr, i32 2
	Show All 19 Lines
	; GCN: ; %bb.0: ; %main_body			; GCN: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b32 s3, exec_lo			; GCN-NEXT: s_mov_b32 s3, exec_lo
	; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo			; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GCN-NEXT: s_mov_b32 m0, s2			; GCN-NEXT: s_mov_b32 m0, s2
	; GCN-NEXT: lds_param_load v1, attr0.x wait_vdst:15			; GCN-NEXT: lds_param_load v1, attr0.x wait_vdst:15
	; GCN-NEXT: s_mov_b32 exec_lo, s3			; GCN-NEXT: s_mov_b32 exec_lo, s3
	; GCN-NEXT: v_mov_b32_e32 v0, s0			; GCN-NEXT: v_mov_b32_e32 v0, s0
	; GCN-NEXT: v_mov_b32_e32 v2, s1			; GCN-NEXT: v_mov_b32_e32 v2, s1
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GCN-NEXT: v_interp_p10_f16_f32 v3, v1, v0, v1			; GCN-NEXT: v_interp_p10_f16_f32 v3, v1, v0, v1
	; GCN-NEXT: v_interp_p10_f16_f32 v0, v1, v0, v1 op_sel:[1,0,1,0] wait_exp:7			; GCN-NEXT: v_interp_p10_f16_f32 v0, v1, v0, v1 op_sel:[1,0,1,0] wait_exp:7
	; GCN-NEXT: v_interp_p2_f16_f32 v3, v1, v2, v3 wait_exp:7			; GCN-NEXT: v_interp_p2_f16_f32 v3, v1, v2, v3 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GCN-NEXT: v_interp_p2_f16_f32 v0, v1, v2, v0 op_sel:[1,0,0,0] wait_exp:7			; GCN-NEXT: v_interp_p2_f16_f32 v0, v1, v2, v0 op_sel:[1,0,0,0] wait_exp:7
	; GCN-NEXT: v_add_f16_e32 v0, v3, v0			; GCN-NEXT: v_add_f16_e32 v0, v3, v0
	; GCN-NEXT: ; return to shader part epilog			; GCN-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %m0)			%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %m0)
	%l_p0 = call float @llvm.amdgcn.interp.inreg.p10.f16(float %p0, float %i, float %p0, i1 0)			%l_p0 = call float @llvm.amdgcn.interp.inreg.p10.f16(float %p0, float %i, float %p0, i1 0)
	%l_p1 = call half @llvm.amdgcn.interp.inreg.p2.f16(float %p0, float %j, float %l_p0, i1 0)			%l_p1 = call half @llvm.amdgcn.interp.inreg.p2.f16(float %p0, float %j, float %l_p0, i1 0)
	%h_p0 = call float @llvm.amdgcn.interp.inreg.p10.f16(float %p0, float %i, float %p0, i1 1)			%h_p0 = call float @llvm.amdgcn.interp.inreg.p10.f16(float %p0, float %i, float %p0, i1 1)
	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	;			;
	; GFX11-LABEL: image_bvh_intersect_ray_a16:			; GFX11-LABEL: image_bvh_intersect_ray_a16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: v_lshrrev_b32_e32 v9, 16, v5			; GFX11-NEXT: v_lshrrev_b32_e32 v9, 16, v5
	; GFX11-NEXT: v_lshrrev_b32_e32 v10, 16, v7			; GFX11-NEXT: v_lshrrev_b32_e32 v10, 16, v7
	; GFX11-NEXT: v_lshlrev_b32_e32 v5, 16, v5			; GFX11-NEXT: v_lshlrev_b32_e32 v5, 16, v5
	; GFX11-NEXT: v_lshlrev_b32_e32 v11, 16, v6			; GFX11-NEXT: v_lshlrev_b32_e32 v11, 16, v6
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_lshlrev_b32_e32 v9, 16, v9			; GFX11-NEXT: v_lshlrev_b32_e32 v9, 16, v9
	; GFX11-NEXT: v_and_or_b32 v5, 0xffff, v7, v5			; GFX11-NEXT: v_and_or_b32 v5, 0xffff, v7, v5
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_and_or_b32 v7, 0xffff, v8, v11			; GFX11-NEXT: v_and_or_b32 v7, 0xffff, v8, v11
	; GFX11-NEXT: v_and_or_b32 v6, 0xffff, v10, v9			; GFX11-NEXT: v_and_or_b32 v6, 0xffff, v10, v9
	; GFX11-NEXT: image_bvh_intersect_ray v[0:3], v[0:7], s[0:3] a16			; GFX11-NEXT: image_bvh_intersect_ray v[0:3], v[0:7], s[0:3] a16
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	%v = call <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32 %node_ptr, float %ray_extent, <3 x float> %ray_origin, <3 x half> %ray_dir, <3 x half> %ray_inv_dir, <4 x i32> %tdescr)			%v = call <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i32.v4f16(i32 %node_ptr, float %ray_extent, <3 x float> %ray_origin, <3 x half> %ray_dir, <3 x half> %ray_inv_dir, <4 x i32> %tdescr)
	%r = bitcast <4 x i32> %v to <4 x float>			%r = bitcast <4 x i32> %v to <4 x float>
	ret <4 x float> %r			ret <4 x float> %r
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	;			;
	; GFX11-LABEL: image_bvh64_intersect_ray_a16:			; GFX11-LABEL: image_bvh64_intersect_ray_a16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: v_lshrrev_b32_e32 v10, 16, v6			; GFX11-NEXT: v_lshrrev_b32_e32 v10, 16, v6
	; GFX11-NEXT: v_lshrrev_b32_e32 v11, 16, v8			; GFX11-NEXT: v_lshrrev_b32_e32 v11, 16, v8
	; GFX11-NEXT: v_lshlrev_b32_e32 v6, 16, v6			; GFX11-NEXT: v_lshlrev_b32_e32 v6, 16, v6
	; GFX11-NEXT: v_lshlrev_b32_e32 v12, 16, v7			; GFX11-NEXT: v_lshlrev_b32_e32 v12, 16, v7
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_lshlrev_b32_e32 v10, 16, v10			; GFX11-NEXT: v_lshlrev_b32_e32 v10, 16, v10
	; GFX11-NEXT: v_and_or_b32 v6, 0xffff, v8, v6			; GFX11-NEXT: v_and_or_b32 v6, 0xffff, v8, v6
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_and_or_b32 v8, 0xffff, v9, v12			; GFX11-NEXT: v_and_or_b32 v8, 0xffff, v9, v12
	; GFX11-NEXT: v_and_or_b32 v7, 0xffff, v11, v10			; GFX11-NEXT: v_and_or_b32 v7, 0xffff, v11, v10
	; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], v[0:15], s[0:3] a16			; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], v[0:15], s[0:3] a16
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	%v = call <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64 %node_ptr, float %ray_extent, <3 x float> %ray_origin, <3 x half> %ray_dir, <3 x half> %ray_inv_dir, <4 x i32> %tdescr)			%v = call <4 x i32> @llvm.amdgcn.image.bvh.intersect.ray.i64.v4f16(i64 %node_ptr, float %ray_extent, <3 x float> %ray_origin, <3 x half> %ray_dir, <3 x half> %ray_inv_dir, <4 x i32> %tdescr)
	%r = bitcast <4 x i32> %v to <4 x float>			%r = bitcast <4 x i32> %v to <4 x float>
	ret <4 x float> %r			ret <4 x float> %r
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_mov_b32_e32 v16, v3			; GFX11-NEXT: v_mov_b32_e32 v16, v3
	; GFX11-NEXT: v_mov_b32_e32 v17, v4			; GFX11-NEXT: v_mov_b32_e32 v17, v4
	; GFX11-NEXT: s_mov_b32 s1, exec_lo			; GFX11-NEXT: s_mov_b32 s1, exec_lo
	; GFX11-NEXT: .LBB6_1: ; =>This Inner Loop Header: Depth=1			; GFX11-NEXT: .LBB6_1: ; =>This Inner Loop Header: Depth=1
	; GFX11-NEXT: v_readfirstlane_b32 s4, v11			; GFX11-NEXT: v_readfirstlane_b32 s4, v11
	; GFX11-NEXT: v_readfirstlane_b32 s5, v12			; GFX11-NEXT: v_readfirstlane_b32 s5, v12
	; GFX11-NEXT: v_readfirstlane_b32 s6, v13			; GFX11-NEXT: v_readfirstlane_b32 s6, v13
	; GFX11-NEXT: v_readfirstlane_b32 s7, v14			; GFX11-NEXT: v_readfirstlane_b32 s7, v14
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, s[4:5], v[11:12]			; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, s[4:5], v[11:12]
	; GFX11-NEXT: v_cmp_eq_u64_e64 s0, s[6:7], v[13:14]			; GFX11-NEXT: v_cmp_eq_u64_e64 s0, s[6:7], v[13:14]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0			; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0
	; GFX11-NEXT: s_and_saveexec_b32 s0, s0			; GFX11-NEXT: s_and_saveexec_b32 s0, s0
	; GFX11-NEXT: image_bvh_intersect_ray v[0:3], [v18, v19, v[15:17], v[5:7], v[8:10]], s[4:7]			; GFX11-NEXT: image_bvh_intersect_ray v[0:3], [v18, v19, v[15:17], v[5:7], v[8:10]], s[4:7]
	; GFX11-NEXT: ; implicit-def: $vgpr11			; GFX11-NEXT: ; implicit-def: $vgpr11
	; GFX11-NEXT: ; implicit-def: $vgpr18			; GFX11-NEXT: ; implicit-def: $vgpr18
	; GFX11-NEXT: ; implicit-def: $vgpr19			; GFX11-NEXT: ; implicit-def: $vgpr19
	; GFX11-NEXT: ; implicit-def: $vgpr15_vgpr16_vgpr17			; GFX11-NEXT: ; implicit-def: $vgpr15_vgpr16_vgpr17
	; GFX11-NEXT: ; implicit-def: $vgpr5_vgpr6_vgpr7			; GFX11-NEXT: ; implicit-def: $vgpr5_vgpr6_vgpr7
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_and_or_b32 v4, 0xffff, v7, v2			; GFX11-NEXT: v_and_or_b32 v4, 0xffff, v7, v2
	; GFX11-NEXT: v_and_or_b32 v5, 0xffff, v1, v0			; GFX11-NEXT: v_and_or_b32 v5, 0xffff, v1, v0
	; GFX11-NEXT: v_and_or_b32 v6, 0xffff, v8, v3			; GFX11-NEXT: v_and_or_b32 v6, 0xffff, v8, v3
	; GFX11-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1			; GFX11-NEXT: .LBB7_1: ; =>This Inner Loop Header: Depth=1
	; GFX11-NEXT: v_readfirstlane_b32 s4, v9			; GFX11-NEXT: v_readfirstlane_b32 s4, v9
	; GFX11-NEXT: v_readfirstlane_b32 s5, v10			; GFX11-NEXT: v_readfirstlane_b32 s5, v10
	; GFX11-NEXT: v_readfirstlane_b32 s6, v11			; GFX11-NEXT: v_readfirstlane_b32 s6, v11
	; GFX11-NEXT: v_readfirstlane_b32 s7, v12			; GFX11-NEXT: v_readfirstlane_b32 s7, v12
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, s[4:5], v[9:10]			; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, s[4:5], v[9:10]
	; GFX11-NEXT: v_cmp_eq_u64_e64 s0, s[6:7], v[11:12]			; GFX11-NEXT: v_cmp_eq_u64_e64 s0, s[6:7], v[11:12]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0			; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0
	; GFX11-NEXT: s_and_saveexec_b32 s0, s0			; GFX11-NEXT: s_and_saveexec_b32 s0, s0
	; GFX11-NEXT: image_bvh_intersect_ray v[0:3], [v13, v14, v[15:17], v[4:6]], s[4:7] a16			; GFX11-NEXT: image_bvh_intersect_ray v[0:3], [v13, v14, v[15:17], v[4:6]], s[4:7] a16
	; GFX11-NEXT: ; implicit-def: $vgpr9			; GFX11-NEXT: ; implicit-def: $vgpr9
	; GFX11-NEXT: ; implicit-def: $vgpr13			; GFX11-NEXT: ; implicit-def: $vgpr13
	; GFX11-NEXT: ; implicit-def: $vgpr14			; GFX11-NEXT: ; implicit-def: $vgpr14
	; GFX11-NEXT: ; implicit-def: $vgpr15_vgpr16_vgpr17			; GFX11-NEXT: ; implicit-def: $vgpr15_vgpr16_vgpr17
	; GFX11-NEXT: ; implicit-def: $vgpr4_vgpr5_vgpr6			; GFX11-NEXT: ; implicit-def: $vgpr4_vgpr5_vgpr6
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_mov_b32_e32 v17, v4			; GFX11-NEXT: v_mov_b32_e32 v17, v4
	; GFX11-NEXT: v_mov_b32_e32 v18, v5			; GFX11-NEXT: v_mov_b32_e32 v18, v5
	; GFX11-NEXT: s_mov_b32 s1, exec_lo			; GFX11-NEXT: s_mov_b32 s1, exec_lo
	; GFX11-NEXT: .LBB8_1: ; =>This Inner Loop Header: Depth=1			; GFX11-NEXT: .LBB8_1: ; =>This Inner Loop Header: Depth=1
	; GFX11-NEXT: v_readfirstlane_b32 s4, v12			; GFX11-NEXT: v_readfirstlane_b32 s4, v12
	; GFX11-NEXT: v_readfirstlane_b32 s5, v13			; GFX11-NEXT: v_readfirstlane_b32 s5, v13
	; GFX11-NEXT: v_readfirstlane_b32 s6, v14			; GFX11-NEXT: v_readfirstlane_b32 s6, v14
	; GFX11-NEXT: v_readfirstlane_b32 s7, v15			; GFX11-NEXT: v_readfirstlane_b32 s7, v15
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, s[4:5], v[12:13]			; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, s[4:5], v[12:13]
	; GFX11-NEXT: v_cmp_eq_u64_e64 s0, s[6:7], v[14:15]			; GFX11-NEXT: v_cmp_eq_u64_e64 s0, s[6:7], v[14:15]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0			; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0
	; GFX11-NEXT: s_and_saveexec_b32 s0, s0			; GFX11-NEXT: s_and_saveexec_b32 s0, s0
	; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], [v[19:20], v21, v[16:18], v[6:8], v[9:11]], s[4:7]			; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], [v[19:20], v21, v[16:18], v[6:8], v[9:11]], s[4:7]
	; GFX11-NEXT: ; implicit-def: $vgpr12			; GFX11-NEXT: ; implicit-def: $vgpr12
	; GFX11-NEXT: ; implicit-def: $vgpr19_vgpr20			; GFX11-NEXT: ; implicit-def: $vgpr19_vgpr20
	; GFX11-NEXT: ; implicit-def: $vgpr21			; GFX11-NEXT: ; implicit-def: $vgpr21
	; GFX11-NEXT: ; implicit-def: $vgpr16_vgpr17_vgpr18			; GFX11-NEXT: ; implicit-def: $vgpr16_vgpr17_vgpr18
	; GFX11-NEXT: ; implicit-def: $vgpr6_vgpr7_vgpr8			; GFX11-NEXT: ; implicit-def: $vgpr6_vgpr7_vgpr8
	▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_and_or_b32 v5, 0xffff, v1, v0			; GFX11-NEXT: v_and_or_b32 v5, 0xffff, v1, v0
	; GFX11-NEXT: v_and_or_b32 v6, 0xffff, v9, v3			; GFX11-NEXT: v_and_or_b32 v6, 0xffff, v9, v3
	; GFX11-NEXT: s_mov_b32 s1, exec_lo			; GFX11-NEXT: s_mov_b32 s1, exec_lo
	; GFX11-NEXT: .LBB9_1: ; =>This Inner Loop Header: Depth=1			; GFX11-NEXT: .LBB9_1: ; =>This Inner Loop Header: Depth=1
	; GFX11-NEXT: v_readfirstlane_b32 s4, v10			; GFX11-NEXT: v_readfirstlane_b32 s4, v10
	; GFX11-NEXT: v_readfirstlane_b32 s5, v11			; GFX11-NEXT: v_readfirstlane_b32 s5, v11
	; GFX11-NEXT: v_readfirstlane_b32 s6, v12			; GFX11-NEXT: v_readfirstlane_b32 s6, v12
	; GFX11-NEXT: v_readfirstlane_b32 s7, v13			; GFX11-NEXT: v_readfirstlane_b32 s7, v13
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, s[4:5], v[10:11]			; GFX11-NEXT: v_cmp_eq_u64_e32 vcc_lo, s[4:5], v[10:11]
	; GFX11-NEXT: v_cmp_eq_u64_e64 s0, s[6:7], v[12:13]			; GFX11-NEXT: v_cmp_eq_u64_e64 s0, s[6:7], v[12:13]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0			; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0
	; GFX11-NEXT: s_and_saveexec_b32 s0, s0			; GFX11-NEXT: s_and_saveexec_b32 s0, s0
	; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], [v[14:15], v16, v[17:19], v[4:6]], s[4:7] a16			; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], [v[14:15], v16, v[17:19], v[4:6]], s[4:7] a16
	; GFX11-NEXT: ; implicit-def: $vgpr10			; GFX11-NEXT: ; implicit-def: $vgpr10
	; GFX11-NEXT: ; implicit-def: $vgpr14_vgpr15			; GFX11-NEXT: ; implicit-def: $vgpr14_vgpr15
	; GFX11-NEXT: ; implicit-def: $vgpr16			; GFX11-NEXT: ; implicit-def: $vgpr16
	; GFX11-NEXT: ; implicit-def: $vgpr17_vgpr18_vgpr19			; GFX11-NEXT: ; implicit-def: $vgpr17_vgpr18_vgpr19
	; GFX11-NEXT: ; implicit-def: $vgpr4_vgpr5_vgpr6			; GFX11-NEXT: ; implicit-def: $vgpr4_vgpr5_vgpr6
	▲ Show 20 Lines • Show All 361 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_mov_b32_e32 v5, s11			; GFX11-NEXT: v_mov_b32_e32 v5, s11
	; GFX11-NEXT: v_mov_b32_e32 v7, s13			; GFX11-NEXT: v_mov_b32_e32 v7, s13
	; GFX11-NEXT: v_mov_b32_e32 v8, s14			; GFX11-NEXT: v_mov_b32_e32 v8, s14
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, s4			; GFX11-NEXT: v_mov_b32_e32 v0, s4
	; GFX11-NEXT: v_mov_b32_e32 v1, s5			; GFX11-NEXT: v_mov_b32_e32 v1, s5
	; GFX11-NEXT: s_mov_b32 s4, 0xb36211c7			; GFX11-NEXT: s_mov_b32 s4, 0xb36211c7
	; GFX11-NEXT: s_movk_i32 s5, 0x102			; GFX11-NEXT: s_movk_i32 s5, 0x102
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_mov_b32_e32 v10, s5			; GFX11-NEXT: v_mov_b32_e32 v10, s5
	; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, v2			; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, v2
	; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
	; GFX11-NEXT: v_mov_b32_e32 v9, s4			; GFX11-NEXT: v_mov_b32_e32 v9, s4
	; GFX11-NEXT: flat_load_b32 v11, v[0:1]			; GFX11-NEXT: flat_load_b32 v11, v[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v0, s6			; GFX11-NEXT: v_mov_b32_e32 v0, s6
	; GFX11-NEXT: v_mov_b32_e32 v1, s7			; GFX11-NEXT: v_mov_b32_e32 v1, s7
	; GFX11-NEXT: v_mov_b32_e32 v2, s8			; GFX11-NEXT: v_mov_b32_e32 v2, s8
	▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_mov_b32_e32 v3, s9			; GFX11-NEXT: v_mov_b32_e32 v3, s9
	; GFX11-NEXT: v_mov_b32_e32 v4, s10			; GFX11-NEXT: v_mov_b32_e32 v4, s10
	; GFX11-NEXT: v_mov_b32_e32 v5, s11			; GFX11-NEXT: v_mov_b32_e32 v5, s11
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, s4			; GFX11-NEXT: v_mov_b32_e32 v0, s4
	; GFX11-NEXT: v_mov_b32_e32 v1, s5			; GFX11-NEXT: v_mov_b32_e32 v1, s5
	; GFX11-NEXT: s_mov_b32 s4, 0xb36211c6			; GFX11-NEXT: s_mov_b32 s4, 0xb36211c6
	; GFX11-NEXT: s_movk_i32 s5, 0x102			; GFX11-NEXT: s_movk_i32 s5, 0x102
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_mov_b32_e32 v7, s5			; GFX11-NEXT: v_mov_b32_e32 v7, s5
	; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, v2			; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v0, v2
	; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
	; GFX11-NEXT: v_mov_b32_e32 v6, s4			; GFX11-NEXT: v_mov_b32_e32 v6, s4
	; GFX11-NEXT: flat_load_b32 v8, v[0:1]			; GFX11-NEXT: flat_load_b32 v8, v[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v0, s6			; GFX11-NEXT: v_mov_b32_e32 v0, s6
	; GFX11-NEXT: v_mov_b32_e32 v1, s7			; GFX11-NEXT: v_mov_b32_e32 v1, s7
	; GFX11-NEXT: v_mov_b32_e32 v2, s8			; GFX11-NEXT: v_mov_b32_e32 v2, s8
	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=fiji -o - %s \| FileCheck -check-prefix=GFX8 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=fiji -o - %s \| FileCheck -check-prefix=GFX8 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

	define i7 @v_saddsat_i7(i7 %lhs, i7 %rhs) {			define i7 @v_saddsat_i7(i7 %lhs, i7 %rhs) {
	; GFX6-LABEL: v_saddsat_i7:			; GFX6-LABEL: v_saddsat_i7:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-NEXT: v_lshlrev_b32_e32 v0, 25, v0			; GFX6-NEXT: v_lshlrev_b32_e32 v0, 25, v0
	; GFX6-NEXT: v_min_i32_e32 v3, 0, v0			; GFX6-NEXT: v_min_i32_e32 v3, 0, v0
	; GFX6-NEXT: v_lshlrev_b32_e32 v1, 25, v1			; GFX6-NEXT: v_lshlrev_b32_e32 v1, 25, v1
	▲ Show 20 Lines • Show All 5,979 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=fiji -o - %s \| FileCheck -check-prefix=GFX8 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=fiji -o - %s \| FileCheck -check-prefix=GFX8 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

	define i7 @v_ssubsat_i7(i7 %lhs, i7 %rhs) {			define i7 @v_ssubsat_i7(i7 %lhs, i7 %rhs) {
	; GFX6-LABEL: v_ssubsat_i7:			; GFX6-LABEL: v_ssubsat_i7:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-NEXT: v_lshlrev_b32_e32 v0, 25, v0			; GFX6-NEXT: v_lshlrev_b32_e32 v0, 25, v0
	; GFX6-NEXT: v_max_i32_e32 v2, -1, v0			; GFX6-NEXT: v_max_i32_e32 v2, -1, v0
	; GFX6-NEXT: v_lshlrev_b32_e32 v1, 25, v1			; GFX6-NEXT: v_lshlrev_b32_e32 v1, 25, v1
	▲ Show 20 Lines • Show All 6,024 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=fiji -o - %s \| FileCheck -check-prefix=GFX8 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=fiji -o - %s \| FileCheck -check-prefix=GFX8 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

	define i7 @v_uaddsat_i7(i7 %lhs, i7 %rhs) {			define i7 @v_uaddsat_i7(i7 %lhs, i7 %rhs) {
	; GFX6-LABEL: v_uaddsat_i7:			; GFX6-LABEL: v_uaddsat_i7:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-NEXT: v_lshlrev_b32_e32 v0, 25, v0			; GFX6-NEXT: v_lshlrev_b32_e32 v0, 25, v0
	; GFX6-NEXT: v_lshlrev_b32_e32 v1, 25, v1			; GFX6-NEXT: v_lshlrev_b32_e32 v1, 25, v1
	; GFX6-NEXT: v_xor_b32_e32 v2, -1, v0			; GFX6-NEXT: v_xor_b32_e32 v2, -1, v0
	▲ Show 20 Lines • Show All 3,832 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti -o - %s \| FileCheck -check-prefix=GFX6 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=fiji -o - %s \| FileCheck -check-prefix=GFX8 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=fiji -o - %s \| FileCheck -check-prefix=GFX8 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -o - %s \| FileCheck -check-prefix=GFX9 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -o - %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

	define i7 @v_usubsat_i7(i7 %lhs, i7 %rhs) {			define i7 @v_usubsat_i7(i7 %lhs, i7 %rhs) {
	; GFX6-LABEL: v_usubsat_i7:			; GFX6-LABEL: v_usubsat_i7:
	; GFX6: ; %bb.0:			; GFX6: ; %bb.0:
	; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX6-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX6-NEXT: v_lshlrev_b32_e32 v0, 25, v0			; GFX6-NEXT: v_lshlrev_b32_e32 v0, 25, v0
	; GFX6-NEXT: v_lshlrev_b32_e32 v1, 25, v1			; GFX6-NEXT: v_lshlrev_b32_e32 v1, 25, v1
	; GFX6-NEXT: v_min_u32_e32 v1, v0, v1			; GFX6-NEXT: v_min_u32_e32 v1, v0, v1
	▲ Show 20 Lines • Show All 3,702 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: add_i32_constant:			; GFX1164-LABEL: add_i32_constant:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_mov_b64 s[2:3], exec			; GFX1164-NEXT: s_mov_b64 s[2:3], exec
	; GFX1164-NEXT: s_mov_b64 s[4:5], exec			; GFX1164-NEXT: s_mov_b64 s[4:5], exec
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
	; GFX1164-NEXT: ; implicit-def: $vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX1164-NEXT: s_cbranch_execz .LBB0_2			; GFX1164-NEXT: s_cbranch_execz .LBB0_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: s_bcnt1_i32_b64 s2, s[2:3]			; GFX1164-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_mul_i32 s2, s2, 5			; GFX1164-NEXT: s_mul_i32 s2, s2, 5
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s2			; GFX1164-NEXT: v_mov_b32_e32 v2, s2
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_add_rtn_u32 v1, v1, v2			; GFX1164-NEXT: ds_add_rtn_u32 v1, v1, v2
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB0_2:			; GFX1164-NEXT: .LBB0_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v1			; GFX1164-NEXT: v_readfirstlane_b32 s2, v1
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_mad_u32_u24 v0, v0, 5, s2			; GFX1164-NEXT: v_mad_u32_u24 v0, v0, 5, s2
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: add_i32_constant:			; GFX1132-LABEL: add_i32_constant:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_mov_b32 s3, exec_lo			; GFX1132-NEXT: s_mov_b32 s3, exec_lo
	; GFX1132-NEXT: s_mov_b32 s2, exec_lo			; GFX1132-NEXT: s_mov_b32 s2, exec_lo
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, s3, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, s3, 0
	; GFX1132-NEXT: ; implicit-def: $vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX1132-NEXT: s_cbranch_execz .LBB0_2			; GFX1132-NEXT: s_cbranch_execz .LBB0_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: s_bcnt1_i32_b32 s3, s3			; GFX1132-NEXT: s_bcnt1_i32_b32 s3, s3
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_mul_i32 s3, s3, 5			; GFX1132-NEXT: s_mul_i32 s3, s3, 5
				; GFX1132-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, s3			; GFX1132-NEXT: v_mov_b32_e32 v2, s3
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_add_rtn_u32 v1, v1, v2			; GFX1132-NEXT: ds_add_rtn_u32 v1, v1, v2
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB0_2:			; GFX1132-NEXT: .LBB0_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v1			; GFX1132-NEXT: v_readfirstlane_b32 s2, v1
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_mad_u32_u24 v0, v0, 5, s2			; GFX1132-NEXT: v_mad_u32_u24 v0, v0, 5, s2
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 5 acq_rel			%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 5 acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_clause 0x1			; GFX1164-NEXT: s_clause 0x1
	; GFX1164-NEXT: s_load_b64 s[4:5], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[4:5], s[0:1], 0x24
	; GFX1164-NEXT: s_load_b32 s6, s[0:1], 0x2c			; GFX1164-NEXT: s_load_b32 s6, s[0:1], 0x2c
	; GFX1164-NEXT: s_mov_b64 s[2:3], exec			; GFX1164-NEXT: s_mov_b64 s[2:3], exec
	; GFX1164-NEXT: s_mov_b64 s[0:1], exec			; GFX1164-NEXT: s_mov_b64 s[0:1], exec
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
	; GFX1164-NEXT: ; implicit-def: $vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX1164-NEXT: s_cbranch_execz .LBB1_2			; GFX1164-NEXT: s_cbranch_execz .LBB1_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: s_bcnt1_i32_b64 s2, s[2:3]			; GFX1164-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: s_mul_i32 s2, s6, s2			; GFX1164-NEXT: s_mul_i32 s2, s6, s2
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s2			; GFX1164-NEXT: v_mov_b32_e32 v2, s2
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_add_rtn_u32 v1, v1, v2			; GFX1164-NEXT: ds_add_rtn_u32 v1, v1, v2
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB1_2:			; GFX1164-NEXT: .LBB1_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[0:1]			; GFX1164-NEXT: s_or_b64 exec, exec, s[0:1]
	; GFX1164-NEXT: v_readfirstlane_b32 s0, v1			; GFX1164-NEXT: v_readfirstlane_b32 s0, v1
	; GFX1164-NEXT: s_mov_b32 s7, 0x31016000			; GFX1164-NEXT: s_mov_b32 s7, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_mad_u64_u32 v[1:2], null, s6, v0, s[0:1]			; GFX1164-NEXT: v_mad_u64_u32 v[1:2], null, s6, v0, s[0:1]
	; GFX1164-NEXT: s_mov_b32 s6, -1			; GFX1164-NEXT: s_mov_b32 s6, -1
	; GFX1164-NEXT: buffer_store_b32 v1, off, s[4:7], 0			; GFX1164-NEXT: buffer_store_b32 v1, off, s[4:7], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: add_i32_uniform:			; GFX1132-LABEL: add_i32_uniform:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_clause 0x1			; GFX1132-NEXT: s_clause 0x1
	; GFX1132-NEXT: s_load_b64 s[4:5], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[4:5], s[0:1], 0x24
	; GFX1132-NEXT: s_load_b32 s0, s[0:1], 0x2c			; GFX1132-NEXT: s_load_b32 s0, s[0:1], 0x2c
	; GFX1132-NEXT: s_mov_b32 s2, exec_lo			; GFX1132-NEXT: s_mov_b32 s2, exec_lo
	; GFX1132-NEXT: s_mov_b32 s1, exec_lo			; GFX1132-NEXT: s_mov_b32 s1, exec_lo
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
	; GFX1132-NEXT: ; implicit-def: $vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX1132-NEXT: s_cbranch_execz .LBB1_2			; GFX1132-NEXT: s_cbranch_execz .LBB1_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: s_bcnt1_i32_b32 s2, s2			; GFX1132-NEXT: s_bcnt1_i32_b32 s2, s2
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: s_mul_i32 s2, s0, s2			; GFX1132-NEXT: s_mul_i32 s2, s0, s2
				; GFX1132-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, s2			; GFX1132-NEXT: v_mov_b32_e32 v2, s2
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_add_rtn_u32 v1, v1, v2			; GFX1132-NEXT: ds_add_rtn_u32 v1, v1, v2
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB1_2:			; GFX1132-NEXT: .LBB1_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s1			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s1
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v1			; GFX1132-NEXT: v_readfirstlane_b32 s2, v1
	; GFX1132-NEXT: s_mov_b32 s7, 0x31016000			; GFX1132-NEXT: s_mov_b32 s7, 0x31016000
	; GFX1132-NEXT: s_mov_b32 s6, -1			; GFX1132-NEXT: s_mov_b32 s6, -1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_mad_u64_u32 v[1:2], null, s0, v0, s[2:3]			; GFX1132-NEXT: v_mad_u64_u32 v[1:2], null, s0, v0, s[2:3]
	; GFX1132-NEXT: buffer_store_b32 v1, off, s[4:7], 0			; GFX1132-NEXT: buffer_store_b32 v1, off, s[4:7], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 %additive acq_rel			%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 %additive acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: add_i32_varying:			; GFX1164-LABEL: add_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB2_2			; GFX1164-NEXT: s_cbranch_execz .LBB2_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_add_rtn_u32 v0, v0, v4			; GFX1164-NEXT: ds_add_rtn_u32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB2_2:			; GFX1164-NEXT: .LBB2_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_e32 v0, s3, v0			; GFX1164-NEXT: v_add_nc_u32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: add_i32_varying:			; GFX1132-LABEL: add_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB2_2			; GFX1132-NEXT: s_cbranch_execz .LBB2_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_add_rtn_u32 v0, v0, v4			; GFX1132-NEXT: ds_add_rtn_u32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB2_2:			; GFX1132-NEXT: .LBB2_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_e32 v0, s3, v0			; GFX1132-NEXT: v_add_nc_u32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw add i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: add_i32_varying_nouse:			; GFX1164-LABEL: add_i32_varying_nouse:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[0:1], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[0:1], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_e32 v1, v1, v2			; GFX1164-NEXT: v_add_nc_u32_e32 v1, v1, v2
	; GFX1164-NEXT: v_permlane64_b32 v2, v1			; GFX1164-NEXT: v_permlane64_b32 v2, v1
	; GFX1164-NEXT: s_mov_b64 exec, s[0:1]			; GFX1164-NEXT: s_mov_b64 exec, s[0:1]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[0:1], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[0:1], -1
	; GFX1164-NEXT: v_add_nc_u32_e32 v1, v1, v2			; GFX1164-NEXT: v_add_nc_u32_e32 v1, v1, v2
	; GFX1164-NEXT: s_mov_b64 exec, s[0:1]			; GFX1164-NEXT: s_mov_b64 exec, s[0:1]
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instid1(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mov_b32_e32 v0, v1			; GFX1164-NEXT: v_mov_b32_e32 v0, v1
	; GFX1164-NEXT: s_mov_b64 s[0:1], exec			; GFX1164-NEXT: s_mov_b64 s[0:1], exec
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v3			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v3
	; GFX1164-NEXT: s_cbranch_execz .LBB3_2			; GFX1164-NEXT: s_cbranch_execz .LBB3_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_add_u32 v3, v0			; GFX1164-NEXT: ds_add_u32 v3, v0
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB3_2:			; GFX1164-NEXT: .LBB3_2:
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: add_i32_varying_nouse:			; GFX1132-LABEL: add_i32_varying_nouse:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s0, -1			; GFX1132-NEXT: s_or_saveexec_b32 s0, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_add_nc_u32_e32 v1, v1, v2			; GFX1132-NEXT: v_add_nc_u32_e32 v1, v1, v2
	; GFX1132-NEXT: s_mov_b32 exec_lo, s0			; GFX1132-NEXT: s_mov_b32 exec_lo, s0
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_mov_b32_e32 v0, v1			; GFX1132-NEXT: v_mov_b32_e32 v0, v1
	; GFX1132-NEXT: s_mov_b32 s0, exec_lo			; GFX1132-NEXT: s_mov_b32 s0, exec_lo
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v3			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v3
	; GFX1132-NEXT: s_cbranch_execz .LBB3_2			; GFX1132-NEXT: s_cbranch_execz .LBB3_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	;			;
	; GFX1164-LABEL: add_i64_constant:			; GFX1164-LABEL: add_i64_constant:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_mov_b64 s[4:5], exec			; GFX1164-NEXT: s_mov_b64 s[4:5], exec
	; GFX1164-NEXT: s_mov_b64 s[2:3], exec			; GFX1164-NEXT: s_mov_b64 s[2:3], exec
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v2, s5, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v2, s5, v0
	; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v2			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v2
	; GFX1164-NEXT: s_cbranch_execz .LBB4_2			; GFX1164-NEXT: s_cbranch_execz .LBB4_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: s_bcnt1_i32_b64 s4, s[4:5]			; GFX1164-NEXT: s_bcnt1_i32_b64 s4, s[4:5]
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_mul_i32 s4, s4, 5			; GFX1164-NEXT: s_mul_i32 s4, s4, 5
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mov_b32_e32 v0, s4			; GFX1164-NEXT: v_mov_b32_e32 v0, s4
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]			; GFX1164-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB4_2:			; GFX1164-NEXT: .LBB4_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v0			; GFX1164-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v1			; GFX1164-NEXT: v_readfirstlane_b32 s3, v1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_mad_u64_u32 v[0:1], null, v2, 5, s[2:3]			; GFX1164-NEXT: v_mad_u64_u32 v[0:1], null, v2, 5, s[2:3]
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: add_i64_constant:			; GFX1132-LABEL: add_i64_constant:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_mov_b32 s3, exec_lo			; GFX1132-NEXT: s_mov_b32 s3, exec_lo
	; GFX1132-NEXT: s_mov_b32 s2, exec_lo			; GFX1132-NEXT: s_mov_b32 s2, exec_lo
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v2, s3, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v2, s3, 0
	; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v2			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v2
	; GFX1132-NEXT: s_cbranch_execz .LBB4_2			; GFX1132-NEXT: s_cbranch_execz .LBB4_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: s_bcnt1_i32_b32 s3, s3			; GFX1132-NEXT: s_bcnt1_i32_b32 s3, s3
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_mul_i32 s3, s3, 5			; GFX1132-NEXT: s_mul_i32 s3, s3, 5
				; GFX1132-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_e32 v0, s3			; GFX1132-NEXT: v_mov_b32_e32 v0, s3
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]			; GFX1132-NEXT: ds_add_rtn_u64 v[0:1], v1, v[0:1]
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB4_2:			; GFX1132-NEXT: .LBB4_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v0			; GFX1132-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v1			; GFX1132-NEXT: v_readfirstlane_b32 s3, v1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_mad_u64_u32 v[0:1], null, v2, 5, s[2:3]			; GFX1132-NEXT: v_mad_u64_u32 v[0:1], null, v2, 5, s[2:3]
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%old = atomicrmw add i64 addrspace(3)* @local_var64, i64 5 acq_rel			%old = atomicrmw add i64 addrspace(3)* @local_var64, i64 5 acq_rel
	▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	;			;
	; GFX1164-LABEL: add_i64_uniform:			; GFX1164-LABEL: add_i64_uniform:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX1164-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX1164-NEXT: s_mov_b64 s[6:7], exec			; GFX1164-NEXT: s_mov_b64 s[6:7], exec
	; GFX1164-NEXT: s_mov_b64 s[4:5], exec			; GFX1164-NEXT: s_mov_b64 s[4:5], exec
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v2, s7, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v2, s7, v0
	; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v2			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v2
	; GFX1164-NEXT: s_cbranch_execz .LBB5_2			; GFX1164-NEXT: s_cbranch_execz .LBB5_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: s_bcnt1_i32_b64 s6, s[6:7]			; GFX1164-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: s_mul_i32 s7, s3, s6			; GFX1164-NEXT: s_mul_i32 s7, s3, s6
	; GFX1164-NEXT: s_mul_hi_u32 s8, s2, s6			; GFX1164-NEXT: s_mul_hi_u32 s8, s2, s6
	; GFX1164-NEXT: s_mul_i32 s6, s2, s6			; GFX1164-NEXT: s_mul_i32 s6, s2, s6
	; GFX1164-NEXT: s_add_i32 s8, s8, s7			; GFX1164-NEXT: s_add_i32 s8, s8, s7
	; GFX1164-NEXT: v_mov_b32_e32 v0, s6			; GFX1164-NEXT: v_mov_b32_e32 v0, s6
	; GFX1164-NEXT: v_mov_b32_e32 v1, s8			; GFX1164-NEXT: v_mov_b32_e32 v1, s8
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1]			; GFX1164-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1]
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB5_2:			; GFX1164-NEXT: .LBB5_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s4, v0			; GFX1164-NEXT: v_readfirstlane_b32 s4, v0
	; GFX1164-NEXT: v_readfirstlane_b32 s5, v1			; GFX1164-NEXT: v_readfirstlane_b32 s5, v1
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mad_u64_u32 v[0:1], null, s2, v2, s[4:5]			; GFX1164-NEXT: v_mad_u64_u32 v[0:1], null, s2, v2, s[4:5]
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: v_mad_u64_u32 v[3:4], null, s3, v2, v[1:2]			; GFX1164-NEXT: v_mad_u64_u32 v[3:4], null, s3, v2, v[1:2]
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v1, v3			; GFX1164-NEXT: v_mov_b32_e32 v1, v3
	; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: add_i64_uniform:			; GFX1132-LABEL: add_i64_uniform:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX1132-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX1132-NEXT: s_mov_b32 s5, exec_lo			; GFX1132-NEXT: s_mov_b32 s5, exec_lo
	; GFX1132-NEXT: s_mov_b32 s4, exec_lo			; GFX1132-NEXT: s_mov_b32 s4, exec_lo
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v2, s5, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v2, s5, 0
	; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v2			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v2
	; GFX1132-NEXT: s_cbranch_execz .LBB5_2			; GFX1132-NEXT: s_cbranch_execz .LBB5_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: s_bcnt1_i32_b32 s5, s5			; GFX1132-NEXT: s_bcnt1_i32_b32 s5, s5
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: s_mul_i32 s6, s3, s5			; GFX1132-NEXT: s_mul_i32 s6, s3, s5
	; GFX1132-NEXT: s_mul_hi_u32 s7, s2, s5			; GFX1132-NEXT: s_mul_hi_u32 s7, s2, s5
	; GFX1132-NEXT: s_mul_i32 s5, s2, s5			; GFX1132-NEXT: s_mul_i32 s5, s2, s5
	; GFX1132-NEXT: s_add_i32 s7, s7, s6			; GFX1132-NEXT: s_add_i32 s7, s7, s6
	; GFX1132-NEXT: v_mov_b32_e32 v0, s5			; GFX1132-NEXT: v_mov_b32_e32 v0, s5
	; GFX1132-NEXT: v_mov_b32_e32 v1, s7			; GFX1132-NEXT: v_mov_b32_e32 v1, s7
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1]			; GFX1132-NEXT: ds_add_rtn_u64 v[0:1], v3, v[0:1]
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB5_2:			; GFX1132-NEXT: .LBB5_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s4
	; GFX1132-NEXT: v_readfirstlane_b32 s4, v0			; GFX1132-NEXT: v_readfirstlane_b32 s4, v0
	; GFX1132-NEXT: v_readfirstlane_b32 s5, v1			; GFX1132-NEXT: v_readfirstlane_b32 s5, v1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mad_u64_u32 v[0:1], null, s2, v2, s[4:5]			; GFX1132-NEXT: v_mad_u64_u32 v[0:1], null, s2, v2, s[4:5]
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: v_mad_u64_u32 v[3:4], null, s3, v2, v[1:2]			; GFX1132-NEXT: v_mad_u64_u32 v[3:4], null, s3, v2, v[1:2]
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v1, v3			; GFX1132-NEXT: v_mov_b32_e32 v1, v3
	; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%old = atomicrmw add i64 addrspace(3)* @local_var64, i64 %additive acq_rel			%old = atomicrmw add i64 addrspace(3)* @local_var64, i64 %additive acq_rel
	store i64 %old, i64 addrspace(1)* %out			store i64 %old, i64 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: sub_i32_constant:			; GFX1164-LABEL: sub_i32_constant:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_mov_b64 s[2:3], exec			; GFX1164-NEXT: s_mov_b64 s[2:3], exec
	; GFX1164-NEXT: s_mov_b64 s[4:5], exec			; GFX1164-NEXT: s_mov_b64 s[4:5], exec
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
	; GFX1164-NEXT: ; implicit-def: $vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX1164-NEXT: s_cbranch_execz .LBB7_2			; GFX1164-NEXT: s_cbranch_execz .LBB7_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: s_bcnt1_i32_b64 s2, s[2:3]			; GFX1164-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_mul_i32 s2, s2, 5			; GFX1164-NEXT: s_mul_i32 s2, s2, 5
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s2			; GFX1164-NEXT: v_mov_b32_e32 v2, s2
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_sub_rtn_u32 v1, v1, v2			; GFX1164-NEXT: ds_sub_rtn_u32 v1, v1, v2
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB7_2:			; GFX1164-NEXT: .LBB7_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v1			; GFX1164-NEXT: v_readfirstlane_b32 s2, v1
	; GFX1164-NEXT: v_mul_u32_u24_e32 v0, 5, v0			; GFX1164-NEXT: v_mul_u32_u24_e32 v0, 5, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_sub_nc_u32_e32 v0, s2, v0			; GFX1164-NEXT: v_sub_nc_u32_e32 v0, s2, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: sub_i32_constant:			; GFX1132-LABEL: sub_i32_constant:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_mov_b32 s3, exec_lo			; GFX1132-NEXT: s_mov_b32 s3, exec_lo
	; GFX1132-NEXT: s_mov_b32 s2, exec_lo			; GFX1132-NEXT: s_mov_b32 s2, exec_lo
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, s3, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, s3, 0
	; GFX1132-NEXT: ; implicit-def: $vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX1132-NEXT: s_cbranch_execz .LBB7_2			; GFX1132-NEXT: s_cbranch_execz .LBB7_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: s_bcnt1_i32_b32 s3, s3			; GFX1132-NEXT: s_bcnt1_i32_b32 s3, s3
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_mul_i32 s3, s3, 5			; GFX1132-NEXT: s_mul_i32 s3, s3, 5
				; GFX1132-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, s3			; GFX1132-NEXT: v_mov_b32_e32 v2, s3
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_sub_rtn_u32 v1, v1, v2			; GFX1132-NEXT: ds_sub_rtn_u32 v1, v1, v2
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB7_2:			; GFX1132-NEXT: .LBB7_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v1			; GFX1132-NEXT: v_readfirstlane_b32 s2, v1
	; GFX1132-NEXT: v_mul_u32_u24_e32 v0, 5, v0			; GFX1132-NEXT: v_mul_u32_u24_e32 v0, 5, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_sub_nc_u32_e32 v0, s2, v0			; GFX1132-NEXT: v_sub_nc_u32_e32 v0, s2, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 5 acq_rel			%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 5 acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_clause 0x1			; GFX1164-NEXT: s_clause 0x1
	; GFX1164-NEXT: s_load_b64 s[4:5], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[4:5], s[0:1], 0x24
	; GFX1164-NEXT: s_load_b32 s6, s[0:1], 0x2c			; GFX1164-NEXT: s_load_b32 s6, s[0:1], 0x2c
	; GFX1164-NEXT: s_mov_b64 s[2:3], exec			; GFX1164-NEXT: s_mov_b64 s[2:3], exec
	; GFX1164-NEXT: s_mov_b64 s[0:1], exec			; GFX1164-NEXT: s_mov_b64 s[0:1], exec
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
	; GFX1164-NEXT: ; implicit-def: $vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, s3, v0
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX1164-NEXT: s_cbranch_execz .LBB8_2			; GFX1164-NEXT: s_cbranch_execz .LBB8_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: s_bcnt1_i32_b64 s2, s[2:3]			; GFX1164-NEXT: s_bcnt1_i32_b64 s2, s[2:3]
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: s_mul_i32 s2, s6, s2			; GFX1164-NEXT: s_mul_i32 s2, s6, s2
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s2			; GFX1164-NEXT: v_mov_b32_e32 v2, s2
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_sub_rtn_u32 v1, v1, v2			; GFX1164-NEXT: ds_sub_rtn_u32 v1, v1, v2
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB8_2:			; GFX1164-NEXT: .LBB8_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[0:1]			; GFX1164-NEXT: s_or_b64 exec, exec, s[0:1]
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: v_mul_lo_u32 v0, s6, v0			; GFX1164-NEXT: v_mul_lo_u32 v0, s6, v0
	; GFX1164-NEXT: v_readfirstlane_b32 s0, v1			; GFX1164-NEXT: v_readfirstlane_b32 s0, v1
	; GFX1164-NEXT: s_mov_b32 s7, 0x31016000			; GFX1164-NEXT: s_mov_b32 s7, 0x31016000
	; GFX1164-NEXT: s_mov_b32 s6, -1			; GFX1164-NEXT: s_mov_b32 s6, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_sub_nc_u32_e32 v0, s0, v0			; GFX1164-NEXT: v_sub_nc_u32_e32 v0, s0, v0
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[4:7], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[4:7], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: sub_i32_uniform:			; GFX1132-LABEL: sub_i32_uniform:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_clause 0x1			; GFX1132-NEXT: s_clause 0x1
	; GFX1132-NEXT: s_load_b64 s[4:5], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[4:5], s[0:1], 0x24
	; GFX1132-NEXT: s_load_b32 s0, s[0:1], 0x2c			; GFX1132-NEXT: s_load_b32 s0, s[0:1], 0x2c
	; GFX1132-NEXT: s_mov_b32 s2, exec_lo			; GFX1132-NEXT: s_mov_b32 s2, exec_lo
	; GFX1132-NEXT: s_mov_b32 s1, exec_lo			; GFX1132-NEXT: s_mov_b32 s1, exec_lo
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, s2, 0
	; GFX1132-NEXT: ; implicit-def: $vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v0			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v0
	; GFX1132-NEXT: s_cbranch_execz .LBB8_2			; GFX1132-NEXT: s_cbranch_execz .LBB8_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: s_bcnt1_i32_b32 s2, s2			; GFX1132-NEXT: s_bcnt1_i32_b32 s2, s2
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: s_mul_i32 s2, s0, s2			; GFX1132-NEXT: s_mul_i32 s2, s0, s2
				; GFX1132-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, s2			; GFX1132-NEXT: v_mov_b32_e32 v2, s2
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_sub_rtn_u32 v1, v1, v2			; GFX1132-NEXT: ds_sub_rtn_u32 v1, v1, v2
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB8_2:			; GFX1132-NEXT: .LBB8_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s1			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: v_mul_lo_u32 v0, s0, v0			; GFX1132-NEXT: v_mul_lo_u32 v0, s0, v0
	; GFX1132-NEXT: v_readfirstlane_b32 s0, v1			; GFX1132-NEXT: v_readfirstlane_b32 s0, v1
	; GFX1132-NEXT: s_mov_b32 s7, 0x31016000			; GFX1132-NEXT: s_mov_b32 s7, 0x31016000
	; GFX1132-NEXT: s_mov_b32 s6, -1			; GFX1132-NEXT: s_mov_b32 s6, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_sub_nc_u32_e32 v0, s0, v0			; GFX1132-NEXT: v_sub_nc_u32_e32 v0, s0, v0
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[4:7], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[4:7], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 %subitive acq_rel			%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 %subitive acq_rel
	store i32 %old, i32 addrspace(1)* %out			store i32 %old, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: sub_i32_varying:			; GFX1164-LABEL: sub_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB9_2			; GFX1164-NEXT: s_cbranch_execz .LBB9_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_sub_rtn_u32 v0, v0, v4			; GFX1164-NEXT: ds_sub_rtn_u32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB9_2:			; GFX1164-NEXT: .LBB9_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_sub_nc_u32_e32 v0, s3, v0			; GFX1164-NEXT: v_sub_nc_u32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: sub_i32_varying:			; GFX1132-LABEL: sub_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB9_2			; GFX1132-NEXT: s_cbranch_execz .LBB9_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_sub_rtn_u32 v0, v0, v4			; GFX1132-NEXT: ds_sub_rtn_u32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB9_2:			; GFX1132-NEXT: .LBB9_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_sub_nc_u32_e32 v0, s3, v0			; GFX1132-NEXT: v_sub_nc_u32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw sub i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: sub_i32_varying_nouse:			; GFX1164-LABEL: sub_i32_varying_nouse:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[0:1], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[0:1], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_add_nc_u32_e32 v1, v1, v2			; GFX1164-NEXT: v_add_nc_u32_e32 v1, v1, v2
	; GFX1164-NEXT: v_permlane64_b32 v2, v1			; GFX1164-NEXT: v_permlane64_b32 v2, v1
	; GFX1164-NEXT: s_mov_b64 exec, s[0:1]			; GFX1164-NEXT: s_mov_b64 exec, s[0:1]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[0:1], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[0:1], -1
	; GFX1164-NEXT: v_add_nc_u32_e32 v1, v1, v2			; GFX1164-NEXT: v_add_nc_u32_e32 v1, v1, v2
	; GFX1164-NEXT: s_mov_b64 exec, s[0:1]			; GFX1164-NEXT: s_mov_b64 exec, s[0:1]
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instid1(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v3, exec_hi, v0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mov_b32_e32 v0, v1			; GFX1164-NEXT: v_mov_b32_e32 v0, v1
	; GFX1164-NEXT: s_mov_b64 s[0:1], exec			; GFX1164-NEXT: s_mov_b64 s[0:1], exec
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v3			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v3
	; GFX1164-NEXT: s_cbranch_execz .LBB10_2			; GFX1164-NEXT: s_cbranch_execz .LBB10_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_sub_u32 v3, v0			; GFX1164-NEXT: ds_sub_u32 v3, v0
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB10_2:			; GFX1164-NEXT: .LBB10_2:
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: sub_i32_varying_nouse:			; GFX1132-LABEL: sub_i32_varying_nouse:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s0, -1			; GFX1132-NEXT: s_or_saveexec_b32 s0, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_add_nc_u32_dpp v1, v1, v1 row_xmask:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_add_nc_u32_e32 v1, v1, v2			; GFX1132-NEXT: v_add_nc_u32_e32 v1, v1, v2
	; GFX1132-NEXT: s_mov_b32 exec_lo, s0			; GFX1132-NEXT: s_mov_b32 exec_lo, s0
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v3, exec_lo, 0
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_mov_b32_e32 v0, v1			; GFX1132-NEXT: v_mov_b32_e32 v0, v1
	; GFX1132-NEXT: s_mov_b32 s0, exec_lo			; GFX1132-NEXT: s_mov_b32 s0, exec_lo
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v3			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v3
	; GFX1132-NEXT: s_cbranch_execz .LBB10_2			; GFX1132-NEXT: s_cbranch_execz .LBB10_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	;			;
	; GFX1164-LABEL: sub_i64_constant:			; GFX1164-LABEL: sub_i64_constant:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_mov_b64 s[4:5], exec			; GFX1164-NEXT: s_mov_b64 s[4:5], exec
	; GFX1164-NEXT: s_mov_b64 s[2:3], exec			; GFX1164-NEXT: s_mov_b64 s[2:3], exec
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s4, 0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v2, s5, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v2, s5, v0
	; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v2			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v2
	; GFX1164-NEXT: s_cbranch_execz .LBB11_2			; GFX1164-NEXT: s_cbranch_execz .LBB11_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: s_bcnt1_i32_b64 s4, s[4:5]			; GFX1164-NEXT: s_bcnt1_i32_b64 s4, s[4:5]
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_mul_i32 s4, s4, 5			; GFX1164-NEXT: s_mul_i32 s4, s4, 5
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mov_b32_e32 v0, s4			; GFX1164-NEXT: v_mov_b32_e32 v0, s4
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]			; GFX1164-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB11_2:			; GFX1164-NEXT: .LBB11_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v0			; GFX1164-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1164-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX1164-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v1			; GFX1164-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1164-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2			; GFX1164-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_sub_co_u32 v0, vcc, s2, v0			; GFX1164-NEXT: v_sub_co_u32 v0, vcc, s2, v0
	; GFX1164-NEXT: v_sub_co_ci_u32_e32 v1, vcc, s3, v1, vcc			; GFX1164-NEXT: v_sub_co_ci_u32_e32 v1, vcc, s3, v1, vcc
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: sub_i64_constant:			; GFX1132-LABEL: sub_i64_constant:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_mov_b32 s3, exec_lo			; GFX1132-NEXT: s_mov_b32 s3, exec_lo
	; GFX1132-NEXT: s_mov_b32 s2, exec_lo			; GFX1132-NEXT: s_mov_b32 s2, exec_lo
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v2, s3, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v2, s3, 0
	; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v2			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v2
	; GFX1132-NEXT: s_cbranch_execz .LBB11_2			; GFX1132-NEXT: s_cbranch_execz .LBB11_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: s_bcnt1_i32_b32 s3, s3			; GFX1132-NEXT: s_bcnt1_i32_b32 s3, s3
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_mul_i32 s3, s3, 5			; GFX1132-NEXT: s_mul_i32 s3, s3, 5
				; GFX1132-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_e32 v0, s3			; GFX1132-NEXT: v_mov_b32_e32 v0, s3
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]			; GFX1132-NEXT: ds_sub_rtn_u64 v[0:1], v1, v[0:1]
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB11_2:			; GFX1132-NEXT: .LBB11_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v0			; GFX1132-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1132-NEXT: v_mul_u32_u24_e32 v0, 5, v2			; GFX1132-NEXT: v_mul_u32_u24_e32 v0, 5, v2
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v1			; GFX1132-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1132-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2			; GFX1132-NEXT: v_mul_hi_u32_u24_e32 v1, 5, v2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_sub_co_u32 v0, vcc_lo, s2, v0			; GFX1132-NEXT: v_sub_co_u32 v0, vcc_lo, s2, v0
	; GFX1132-NEXT: v_sub_co_ci_u32_e32 v1, vcc_lo, s3, v1, vcc_lo			; GFX1132-NEXT: v_sub_co_ci_u32_e32 v1, vcc_lo, s3, v1, vcc_lo
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	;			;
	; GFX1164-LABEL: sub_i64_uniform:			; GFX1164-LABEL: sub_i64_uniform:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX1164-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX1164-NEXT: s_mov_b64 s[6:7], exec			; GFX1164-NEXT: s_mov_b64 s[6:7], exec
	; GFX1164-NEXT: s_mov_b64 s[4:5], exec			; GFX1164-NEXT: s_mov_b64 s[4:5], exec
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, s6, 0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v2, s7, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v2, s7, v0
	; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v2			; GFX1164-NEXT: v_cmpx_eq_u32_e32 0, v2
	; GFX1164-NEXT: s_cbranch_execz .LBB12_2			; GFX1164-NEXT: s_cbranch_execz .LBB12_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: s_bcnt1_i32_b64 s6, s[6:7]			; GFX1164-NEXT: s_bcnt1_i32_b64 s6, s[6:7]
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	Show All 14 Lines
	; GFX1164-NEXT: v_mad_u64_u32 v[3:4], null, s2, v2, 0			; GFX1164-NEXT: v_mad_u64_u32 v[3:4], null, s2, v2, 0
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v0			; GFX1164-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1164-NEXT: v_readfirstlane_b32 s4, v1			; GFX1164-NEXT: v_readfirstlane_b32 s4, v1
	; GFX1164-NEXT: s_waitcnt_depctr 0xfff			; GFX1164-NEXT: s_waitcnt_depctr 0xfff
	; GFX1164-NEXT: v_mad_u64_u32 v[5:6], null, s3, v2, v[4:5]			; GFX1164-NEXT: v_mad_u64_u32 v[5:6], null, s3, v2, v[4:5]
	; GFX1164-NEXT: v_sub_co_u32 v0, vcc, s2, v3			; GFX1164-NEXT: v_sub_co_u32 v0, vcc, s2, v3
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v1, v5			; GFX1164-NEXT: v_mov_b32_e32 v1, v5
	; GFX1164-NEXT: v_sub_co_ci_u32_e32 v1, vcc, s4, v1, vcc			; GFX1164-NEXT: v_sub_co_ci_u32_e32 v1, vcc, s4, v1, vcc
	; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: sub_i64_uniform:			; GFX1132-LABEL: sub_i64_uniform:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b128 s[0:3], s[0:1], 0x24			; GFX1132-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
	; GFX1132-NEXT: s_mov_b32 s5, exec_lo			; GFX1132-NEXT: s_mov_b32 s5, exec_lo
	; GFX1132-NEXT: s_mov_b32 s4, exec_lo			; GFX1132-NEXT: s_mov_b32 s4, exec_lo
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v2, s5, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v2, s5, 0
	; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v2			; GFX1132-NEXT: v_cmpx_eq_u32_e32 0, v2
	; GFX1132-NEXT: s_cbranch_execz .LBB12_2			; GFX1132-NEXT: s_cbranch_execz .LBB12_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: s_bcnt1_i32_b32 s5, s5			; GFX1132-NEXT: s_bcnt1_i32_b32 s5, s5
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: s_mul_i32 s6, s3, s5			; GFX1132-NEXT: s_mul_i32 s6, s3, s5
	; GFX1132-NEXT: s_mul_hi_u32 s7, s2, s5			; GFX1132-NEXT: s_mul_hi_u32 s7, s2, s5
	; GFX1132-NEXT: s_mul_i32 s5, s2, s5			; GFX1132-NEXT: s_mul_i32 s5, s2, s5
	; GFX1132-NEXT: s_add_i32 s7, s7, s6			; GFX1132-NEXT: s_add_i32 s7, s7, s6
	; GFX1132-NEXT: v_mov_b32_e32 v0, s5			; GFX1132-NEXT: v_mov_b32_e32 v0, s5
	; GFX1132-NEXT: v_mov_b32_e32 v1, s7			; GFX1132-NEXT: v_mov_b32_e32 v1, s7
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_sub_rtn_u64 v[0:1], v3, v[0:1]			; GFX1132-NEXT: ds_sub_rtn_u64 v[0:1], v3, v[0:1]
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB12_2:			; GFX1132-NEXT: .LBB12_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s4			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: v_mad_u64_u32 v[3:4], null, s2, v2, 0			; GFX1132-NEXT: v_mad_u64_u32 v[3:4], null, s2, v2, 0
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v0			; GFX1132-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1132-NEXT: v_readfirstlane_b32 s4, v1			; GFX1132-NEXT: v_readfirstlane_b32 s4, v1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX1132-NEXT: v_mad_u64_u32 v[5:6], null, s3, v2, v[4:5]			; GFX1132-NEXT: v_mad_u64_u32 v[5:6], null, s3, v2, v[4:5]
	; GFX1132-NEXT: v_sub_co_u32 v0, vcc_lo, s2, v3			; GFX1132-NEXT: v_sub_co_u32 v0, vcc_lo, s2, v3
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v1, v5			; GFX1132-NEXT: v_mov_b32_e32 v1, v5
	; GFX1132-NEXT: v_sub_co_ci_u32_e32 v1, vcc_lo, s4, v1, vcc_lo			; GFX1132-NEXT: v_sub_co_ci_u32_e32 v1, vcc_lo, s4, v1, vcc_lo
	; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%old = atomicrmw sub i64 addrspace(3)* @local_var64, i64 %subitive acq_rel			%old = atomicrmw sub i64 addrspace(3)* @local_var64, i64 %subitive acq_rel
	store i64 %old, i64 addrspace(1)* %out			store i64 %old, i64 addrspace(1)* %out
	ret void			ret void
	▲ Show 20 Lines • Show All 305 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: and_i32_varying:			; GFX1164-LABEL: and_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, -1			; GFX1164-NEXT: v_mov_b32_e32 v1, -1
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: v_mov_b32_e32 v3, -1			; GFX1164-NEXT: v_mov_b32_e32 v3, -1
	; GFX1164-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB14_2			; GFX1164-NEXT: s_cbranch_execz .LBB14_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_and_rtn_b32 v0, v0, v4			; GFX1164-NEXT: ds_and_rtn_b32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB14_2:			; GFX1164-NEXT: .LBB14_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_and_b32_e32 v0, s3, v0			; GFX1164-NEXT: v_and_b32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: and_i32_varying:			; GFX1132-LABEL: and_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, -1			; GFX1132-NEXT: v_mov_b32_e32 v1, -1
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_and_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_and_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_mov_b32_e32 v3, -1			; GFX1132-NEXT: v_mov_b32_e32 v3, -1
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB14_2			; GFX1132-NEXT: s_cbranch_execz .LBB14_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_and_rtn_b32 v0, v0, v4			; GFX1132-NEXT: ds_and_rtn_b32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB14_2:			; GFX1132-NEXT: .LBB14_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_and_b32_e32 v0, s3, v0			; GFX1132-NEXT: v_and_b32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw and i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw and i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: or_i32_varying:			; GFX1164-LABEL: or_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB15_2			; GFX1164-NEXT: s_cbranch_execz .LBB15_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_or_rtn_b32 v0, v0, v4			; GFX1164-NEXT: ds_or_rtn_b32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB15_2:			; GFX1164-NEXT: .LBB15_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_or_b32_e32 v0, s3, v0			; GFX1164-NEXT: v_or_b32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: or_i32_varying:			; GFX1132-LABEL: or_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_or_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_or_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB15_2			; GFX1132-NEXT: s_cbranch_execz .LBB15_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_or_rtn_b32 v0, v0, v4			; GFX1132-NEXT: ds_or_rtn_b32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB15_2:			; GFX1132-NEXT: .LBB15_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_or_b32_e32 v0, s3, v0			; GFX1132-NEXT: v_or_b32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw or i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw or i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: xor_i32_varying:			; GFX1164-LABEL: xor_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB16_2			; GFX1164-NEXT: s_cbranch_execz .LBB16_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_xor_rtn_b32 v0, v0, v4			; GFX1164-NEXT: ds_xor_rtn_b32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB16_2:			; GFX1164-NEXT: .LBB16_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_xor_b32_e32 v0, s3, v0			; GFX1164-NEXT: v_xor_b32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: xor_i32_varying:			; GFX1132-LABEL: xor_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_xor_b32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_xor_b32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB16_2			; GFX1132-NEXT: s_cbranch_execz .LBB16_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_xor_rtn_b32 v0, v0, v4			; GFX1132-NEXT: ds_xor_rtn_b32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB16_2:			; GFX1132-NEXT: .LBB16_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_xor_b32_e32 v0, s3, v0			; GFX1132-NEXT: v_xor_b32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw xor i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw xor i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: max_i32_varying:			; GFX1164-LABEL: max_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_bfrev_b32_e32 v1, 1			; GFX1164-NEXT: v_bfrev_b32_e32 v1, 1
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: v_bfrev_b32_e32 v3, 1			; GFX1164-NEXT: v_bfrev_b32_e32 v3, 1
	; GFX1164-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_max_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_max_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_max_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_max_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB17_2			; GFX1164-NEXT: s_cbranch_execz .LBB17_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_max_rtn_i32 v0, v0, v4			; GFX1164-NEXT: ds_max_rtn_i32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB17_2:			; GFX1164-NEXT: .LBB17_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_max_i32_e32 v0, s3, v0			; GFX1164-NEXT: v_max_i32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: max_i32_varying:			; GFX1132-LABEL: max_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_bfrev_b32_e32 v1, 1			; GFX1132-NEXT: v_bfrev_b32_e32 v1, 1
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_max_i32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_max_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_max_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_bfrev_b32_e32 v3, 1			; GFX1132-NEXT: v_bfrev_b32_e32 v3, 1
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB17_2			; GFX1132-NEXT: s_cbranch_execz .LBB17_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_max_rtn_i32 v0, v0, v4			; GFX1132-NEXT: ds_max_rtn_i32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB17_2:			; GFX1132-NEXT: .LBB17_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_max_i32_e32 v0, s3, v0			; GFX1132-NEXT: v_max_i32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw max i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw max i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	;			;
	; GFX1164-LABEL: max_i64_constant:			; GFX1164-LABEL: max_i64_constant:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1164-NEXT: s_and_saveexec_b64 s[2:3], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB18_2			; GFX1164-NEXT: s_cbranch_execz .LBB18_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 5			; GFX1164-NEXT: v_mov_b32_e32 v0, 5
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: v_mov_b32_e32 v2, 0			; GFX1164-NEXT: v_mov_b32_e32 v2, 0
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_max_rtn_i64 v[0:1], v2, v[0:1]			; GFX1164-NEXT: ds_max_rtn_i64 v[0:1], v2, v[0:1]
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB18_2:			; GFX1164-NEXT: .LBB18_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v0			; GFX1164-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v1			; GFX1164-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1164-NEXT: v_cndmask_b32_e64 v1, 0, 0x80000000, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v1, 0, 0x80000000, vcc
	; GFX1164-NEXT: v_cndmask_b32_e64 v0, 5, 0, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v0, 5, 0, vcc
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_cmp_gt_i64_e32 vcc, s[2:3], v[0:1]			; GFX1164-NEXT: v_cmp_gt_i64_e32 vcc, s[2:3], v[0:1]
	; GFX1164-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc
	; GFX1164-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: max_i64_constant:			; GFX1132-LABEL: max_i64_constant:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1132-NEXT: s_and_saveexec_b32 s2, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s2, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB18_2			; GFX1132-NEXT: s_cbranch_execz .LBB18_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 5			; GFX1132-NEXT: v_mov_b32_e32 v0, 5
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: v_mov_b32_e32 v2, 0			; GFX1132-NEXT: v_mov_b32_e32 v2, 0
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_max_rtn_i64 v[0:1], v2, v[0:1]			; GFX1132-NEXT: ds_max_rtn_i64 v[0:1], v2, v[0:1]
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB18_2:			; GFX1132-NEXT: .LBB18_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v0			; GFX1132-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v1			; GFX1132-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1132-NEXT: v_cndmask_b32_e64 v1, 0, 0x80000000, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v1, 0, 0x80000000, vcc_lo
	; GFX1132-NEXT: v_cndmask_b32_e64 v0, 5, 0, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v0, 5, 0, vcc_lo
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmp_gt_i64_e32 vcc_lo, s[2:3], v[0:1]			; GFX1132-NEXT: v_cmp_gt_i64_e32 vcc_lo, s[2:3], v[0:1]
	; GFX1132-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc_lo
	; GFX1132-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc_lo
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: min_i32_varying:			; GFX1164-LABEL: min_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_bfrev_b32_e32 v1, -2			; GFX1164-NEXT: v_bfrev_b32_e32 v1, -2
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: v_bfrev_b32_e32 v3, -2			; GFX1164-NEXT: v_bfrev_b32_e32 v3, -2
	; GFX1164-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_min_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_min_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_min_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_min_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB19_2			; GFX1164-NEXT: s_cbranch_execz .LBB19_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_min_rtn_i32 v0, v0, v4			; GFX1164-NEXT: ds_min_rtn_i32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB19_2:			; GFX1164-NEXT: .LBB19_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_min_i32_e32 v0, s3, v0			; GFX1164-NEXT: v_min_i32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: min_i32_varying:			; GFX1132-LABEL: min_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_bfrev_b32_e32 v1, -2			; GFX1132-NEXT: v_bfrev_b32_e32 v1, -2
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_min_i32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_min_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_min_i32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_bfrev_b32_e32 v3, -2			; GFX1132-NEXT: v_bfrev_b32_e32 v3, -2
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB19_2			; GFX1132-NEXT: s_cbranch_execz .LBB19_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_min_rtn_i32 v0, v0, v4			; GFX1132-NEXT: ds_min_rtn_i32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB19_2:			; GFX1132-NEXT: .LBB19_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_min_i32_e32 v0, s3, v0			; GFX1132-NEXT: v_min_i32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw min i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw min i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	;			;
	; GFX1164-LABEL: min_i64_constant:			; GFX1164-LABEL: min_i64_constant:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1164-NEXT: s_and_saveexec_b64 s[2:3], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB20_2			; GFX1164-NEXT: s_cbranch_execz .LBB20_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 5			; GFX1164-NEXT: v_mov_b32_e32 v0, 5
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: v_mov_b32_e32 v2, 0			; GFX1164-NEXT: v_mov_b32_e32 v2, 0
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_min_rtn_i64 v[0:1], v2, v[0:1]			; GFX1164-NEXT: ds_min_rtn_i64 v[0:1], v2, v[0:1]
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB20_2:			; GFX1164-NEXT: .LBB20_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v0			; GFX1164-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v1			; GFX1164-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1164-NEXT: v_cndmask_b32_e64 v1, 0, 0x7fffffff, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v1, 0, 0x7fffffff, vcc
	; GFX1164-NEXT: v_cndmask_b32_e64 v0, 5, -1, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v0, 5, -1, vcc
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_cmp_lt_i64_e32 vcc, s[2:3], v[0:1]			; GFX1164-NEXT: v_cmp_lt_i64_e32 vcc, s[2:3], v[0:1]
	; GFX1164-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc
	; GFX1164-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: min_i64_constant:			; GFX1132-LABEL: min_i64_constant:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1132-NEXT: s_and_saveexec_b32 s2, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s2, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB20_2			; GFX1132-NEXT: s_cbranch_execz .LBB20_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 5			; GFX1132-NEXT: v_mov_b32_e32 v0, 5
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: v_mov_b32_e32 v2, 0			; GFX1132-NEXT: v_mov_b32_e32 v2, 0
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_min_rtn_i64 v[0:1], v2, v[0:1]			; GFX1132-NEXT: ds_min_rtn_i64 v[0:1], v2, v[0:1]
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB20_2:			; GFX1132-NEXT: .LBB20_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v0			; GFX1132-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v1			; GFX1132-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1132-NEXT: v_cndmask_b32_e64 v1, 0, 0x7fffffff, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v1, 0, 0x7fffffff, vcc_lo
	; GFX1132-NEXT: v_cndmask_b32_e64 v0, 5, -1, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v0, 5, -1, vcc_lo
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmp_lt_i64_e32 vcc_lo, s[2:3], v[0:1]			; GFX1132-NEXT: v_cmp_lt_i64_e32 vcc_lo, s[2:3], v[0:1]
	; GFX1132-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc_lo
	; GFX1132-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc_lo
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: umax_i32_varying:			; GFX1164-LABEL: umax_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_mov_b32_e32 v3, 0			; GFX1164-NEXT: v_mov_b32_e32 v3, 0
	; GFX1164-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1164-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1164-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB21_2			; GFX1164-NEXT: s_cbranch_execz .LBB21_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_max_rtn_u32 v0, v0, v4			; GFX1164-NEXT: ds_max_rtn_u32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB21_2:			; GFX1164-NEXT: .LBB21_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_max_u32_e32 v0, s3, v0			; GFX1164-NEXT: v_max_u32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: umax_i32_varying:			; GFX1132-LABEL: umax_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf bound_ctrl:1
	; GFX1132-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1			; GFX1132-NEXT: v_max_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_max_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_mov_b32_e32 v3, 0			; GFX1132-NEXT: v_mov_b32_e32 v3, 0
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB21_2			; GFX1132-NEXT: s_cbranch_execz .LBB21_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_max_rtn_u32 v0, v0, v4			; GFX1132-NEXT: ds_max_rtn_u32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB21_2:			; GFX1132-NEXT: .LBB21_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_max_u32_e32 v0, s3, v0			; GFX1132-NEXT: v_max_u32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw umax i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw umax i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	;			;
	; GFX1164-LABEL: umax_i64_constant:			; GFX1164-LABEL: umax_i64_constant:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1164-NEXT: s_and_saveexec_b64 s[2:3], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB22_2			; GFX1164-NEXT: s_cbranch_execz .LBB22_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 5			; GFX1164-NEXT: v_mov_b32_e32 v0, 5
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: v_mov_b32_e32 v2, 0			; GFX1164-NEXT: v_mov_b32_e32 v2, 0
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_max_rtn_u64 v[0:1], v2, v[0:1]			; GFX1164-NEXT: ds_max_rtn_u64 v[0:1], v2, v[0:1]
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB22_2:			; GFX1164-NEXT: .LBB22_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v0			; GFX1164-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v1			; GFX1164-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: v_cndmask_b32_e64 v0, 5, 0, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v0, 5, 0, vcc
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_cmp_gt_u64_e32 vcc, s[2:3], v[0:1]			; GFX1164-NEXT: v_cmp_gt_u64_e32 vcc, s[2:3], v[0:1]
	; GFX1164-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc
	; GFX1164-NEXT: v_cndmask_b32_e64 v1, 0, s3, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v1, 0, s3, vcc
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: umax_i64_constant:			; GFX1132-LABEL: umax_i64_constant:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1132-NEXT: s_and_saveexec_b32 s2, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s2, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB22_2			; GFX1132-NEXT: s_cbranch_execz .LBB22_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 5			; GFX1132-NEXT: v_mov_b32_e32 v0, 5
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: v_mov_b32_e32 v2, 0			; GFX1132-NEXT: v_mov_b32_e32 v2, 0
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_max_rtn_u64 v[0:1], v2, v[0:1]			; GFX1132-NEXT: ds_max_rtn_u64 v[0:1], v2, v[0:1]
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB22_2:			; GFX1132-NEXT: .LBB22_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v0			; GFX1132-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v1			; GFX1132-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: v_cndmask_b32_e64 v0, 5, 0, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v0, 5, 0, vcc_lo
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmp_gt_u64_e32 vcc_lo, s[2:3], v[0:1]			; GFX1132-NEXT: v_cmp_gt_u64_e32 vcc_lo, s[2:3], v[0:1]
	; GFX1132-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc_lo
	; GFX1132-NEXT: v_cndmask_b32_e64 v1, 0, s3, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v1, 0, s3, vcc_lo
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines
	;			;
	; GFX1164-LABEL: umin_i32_varying:			; GFX1164-LABEL: umin_i32_varying:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: v_mov_b32_e32 v1, v0			; GFX1164-NEXT: v_mov_b32_e32 v1, v0
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: v_mov_b32_e32 v1, -1			; GFX1164-NEXT: v_mov_b32_e32 v1, -1
	; GFX1164-NEXT: s_not_b64 exec, exec			; GFX1164-NEXT: s_not_b64 exec, exec
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: v_mov_b32_e32 v3, -1			; GFX1164-NEXT: v_mov_b32_e32 v3, -1
	; GFX1164-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, v1			; GFX1164-NEXT: v_mov_b32_e32 v2, v1
	; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1164-NEXT: v_permlanex16_b32 v2, v2, -1, -1
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1164-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 31			; GFX1164-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mov_b32_e32 v2, s4			; GFX1164-NEXT: v_mov_b32_e32 v2, s4
	; GFX1164-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf			; GFX1164-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xc bank_mask:0xf
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_readlane_b32 s4, v1, 15			; GFX1164-NEXT: v_readlane_b32 s4, v1, 15
	; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1164-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s5, v1, 31			; GFX1164-NEXT: v_readlane_b32 s5, v1, 31
	; GFX1164-NEXT: v_writelane_b32 v3, s4, 16			; GFX1164-NEXT: v_writelane_b32 v3, s4, 16
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[2:3], -1
	; GFX1164-NEXT: v_readlane_b32 s7, v1, 63			; GFX1164-NEXT: v_readlane_b32 s7, v1, 63
	; GFX1164-NEXT: v_readlane_b32 s6, v1, 47			; GFX1164-NEXT: v_readlane_b32 s6, v1, 47
	; GFX1164-NEXT: v_writelane_b32 v3, s5, 32			; GFX1164-NEXT: v_writelane_b32 v3, s5, 32
	; GFX1164-NEXT: s_mov_b64 exec, s[2:3]			; GFX1164-NEXT: s_mov_b64 exec, s[2:3]
				; GFX1164-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(SKIP_3) \| instid1(VALU_DEP_2)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX1164-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX1164-NEXT: v_writelane_b32 v3, s6, 48			; GFX1164-NEXT: v_writelane_b32 v3, s6, 48
	; GFX1164-NEXT: s_mov_b64 exec, s[4:5]			; GFX1164-NEXT: s_mov_b64 exec, s[4:5]
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: ; implicit-def: $vgpr0			; GFX1164-NEXT: ; implicit-def: $vgpr0
	; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB23_2			; GFX1164-NEXT: s_cbranch_execz .LBB23_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 0			; GFX1164-NEXT: v_mov_b32_e32 v0, 0
	; GFX1164-NEXT: v_mov_b32_e32 v4, s7			; GFX1164-NEXT: v_mov_b32_e32 v4, s7
	; GFX1164-NEXT: s_mov_b32 s3, s7			; GFX1164-NEXT: s_mov_b32 s3, s7
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_min_rtn_u32 v0, v0, v4			; GFX1164-NEXT: ds_min_rtn_u32 v0, v0, v4
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB23_2:			; GFX1164-NEXT: .LBB23_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]			; GFX1164-NEXT: s_or_b64 exec, exec, s[4:5]
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v0			; GFX1164-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1164-NEXT: v_mov_b32_e32 v0, v3			; GFX1164-NEXT: v_mov_b32_e32 v0, v3
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_min_u32_e32 v0, s3, v0			; GFX1164-NEXT: v_min_u32_e32 v0, s3, v0
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: umin_i32_varying:			; GFX1132-LABEL: umin_i32_varying:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: v_mov_b32_e32 v1, v0			; GFX1132-NEXT: v_mov_b32_e32 v1, v0
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: v_mov_b32_e32 v1, -1			; GFX1132-NEXT: v_mov_b32_e32 v1, -1
	; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo			; GFX1132-NEXT: s_not_b32 exec_lo, exec_lo
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_min_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xf
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1132-NEXT: v_mov_b32_e32 v2, v1			; GFX1132-NEXT: v_mov_b32_e32 v2, v1
	; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1			; GFX1132-NEXT: v_permlanex16_b32 v2, v2, -1, -1
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX1132-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf			; GFX1132-NEXT: v_min_u32_dpp v1, v2, v1 quad_perm:[0,1,2,3] row_mask:0xa bank_mask:0xf
	; GFX1132-NEXT: v_mov_b32_e32 v3, -1			; GFX1132-NEXT: v_mov_b32_e32 v3, -1
	; GFX1132-NEXT: v_readlane_b32 s3, v1, 15			; GFX1132-NEXT: v_readlane_b32 s3, v1, 15
	; GFX1132-NEXT: v_readlane_b32 s4, v1, 31			; GFX1132-NEXT: v_readlane_b32 s4, v1, 31
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(SALU_CYCLE_1)
	; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf			; GFX1132-NEXT: v_mov_b32_dpp v3, v1 row_shr:1 row_mask:0xf bank_mask:0xf
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
	; GFX1132-NEXT: s_or_saveexec_b32 s2, -1			; GFX1132-NEXT: s_or_saveexec_b32 s2, -1
	; GFX1132-NEXT: v_writelane_b32 v3, s3, 16			; GFX1132-NEXT: v_writelane_b32 v3, s3, 16
	; GFX1132-NEXT: s_mov_b32 exec_lo, s2			; GFX1132-NEXT: s_mov_b32 exec_lo, s2
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: ; implicit-def: $vgpr0			; GFX1132-NEXT: ; implicit-def: $vgpr0
	; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s3, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB23_2			; GFX1132-NEXT: s_cbranch_execz .LBB23_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 0			; GFX1132-NEXT: v_mov_b32_e32 v0, 0
	; GFX1132-NEXT: v_mov_b32_e32 v4, s4			; GFX1132-NEXT: v_mov_b32_e32 v4, s4
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_min_rtn_u32 v0, v0, v4			; GFX1132-NEXT: ds_min_rtn_u32 v0, v0, v4
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB23_2:			; GFX1132-NEXT: .LBB23_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s3
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v0			; GFX1132-NEXT: v_readfirstlane_b32 s3, v0
	; GFX1132-NEXT: v_mov_b32_e32 v0, v3			; GFX1132-NEXT: v_mov_b32_e32 v0, v3
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_min_u32_e32 v0, s3, v0			; GFX1132-NEXT: v_min_u32_e32 v0, s3, v0
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b32 v0, off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%lane = call i32 @llvm.amdgcn.workitem.id.x()			%lane = call i32 @llvm.amdgcn.workitem.id.x()
	%old = atomicrmw umin i32 addrspace(3)* @local_var32, i32 %lane acq_rel			%old = atomicrmw umin i32 addrspace(3)* @local_var32, i32 %lane acq_rel
	▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_waitcnt lgkmcnt(0)			; GFX1032-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0			; GFX1032-NEXT: buffer_store_dwordx2 v[0:1], off, s[0:3], 0
	; GFX1032-NEXT: s_endpgm			; GFX1032-NEXT: s_endpgm
	;			;
	; GFX1164-LABEL: umin_i64_constant:			; GFX1164-LABEL: umin_i64_constant:
	; GFX1164: ; %bb.0: ; %entry			; GFX1164: ; %bb.0: ; %entry
	; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1164-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1164-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0			; GFX1164-NEXT: v_mbcnt_hi_u32_b32 v0, exec_hi, v0
	; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; GFX1164-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
	; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1164-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1164-NEXT: s_and_saveexec_b64 s[2:3], vcc			; GFX1164-NEXT: s_and_saveexec_b64 s[2:3], vcc
	; GFX1164-NEXT: s_cbranch_execz .LBB24_2			; GFX1164-NEXT: s_cbranch_execz .LBB24_2
	; GFX1164-NEXT: ; %bb.1:			; GFX1164-NEXT: ; %bb.1:
	; GFX1164-NEXT: v_mov_b32_e32 v0, 5			; GFX1164-NEXT: v_mov_b32_e32 v0, 5
	; GFX1164-NEXT: v_mov_b32_e32 v1, 0			; GFX1164-NEXT: v_mov_b32_e32 v1, 0
	; GFX1164-NEXT: v_mov_b32_e32 v2, 0			; GFX1164-NEXT: v_mov_b32_e32 v2, 0
	; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1164-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1164-NEXT: ds_min_rtn_u64 v[0:1], v2, v[0:1]			; GFX1164-NEXT: ds_min_rtn_u64 v[0:1], v2, v[0:1]
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_gl0_inv			; GFX1164-NEXT: buffer_gl0_inv
	; GFX1164-NEXT: .LBB24_2:			; GFX1164-NEXT: .LBB24_2:
	; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]			; GFX1164-NEXT: s_or_b64 exec, exec, s[2:3]
	; GFX1164-NEXT: v_readfirstlane_b32 s2, v0			; GFX1164-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1164-NEXT: v_readfirstlane_b32 s3, v1			; GFX1164-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1164-NEXT: v_cndmask_b32_e64 v1, 0, -1, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v1, 0, -1, vcc
	; GFX1164-NEXT: v_cndmask_b32_e64 v0, 5, -1, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v0, 5, -1, vcc
				; GFX1164-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1164-NEXT: v_cmp_lt_u64_e32 vcc, s[2:3], v[0:1]			; GFX1164-NEXT: v_cmp_lt_u64_e32 vcc, s[2:3], v[0:1]
	; GFX1164-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc
	; GFX1164-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc			; GFX1164-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc
	; GFX1164-NEXT: s_mov_b32 s2, -1			; GFX1164-NEXT: s_mov_b32 s2, -1
	; GFX1164-NEXT: s_mov_b32 s3, 0x31016000			; GFX1164-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1164-NEXT: s_waitcnt lgkmcnt(0)			; GFX1164-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1164-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1164-NEXT: s_endpgm			; GFX1164-NEXT: s_endpgm
	;			;
	; GFX1132-LABEL: umin_i64_constant:			; GFX1132-LABEL: umin_i64_constant:
	; GFX1132: ; %bb.0: ; %entry			; GFX1132: ; %bb.0: ; %entry
	; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX1132-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0			; GFX1132-NEXT: v_mbcnt_lo_u32_b32 v0, exec_lo, 0
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; GFX1132-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1			; GFX1132-NEXT: ; implicit-def: $vgpr0_vgpr1
	; GFX1132-NEXT: s_and_saveexec_b32 s2, vcc_lo			; GFX1132-NEXT: s_and_saveexec_b32 s2, vcc_lo
	; GFX1132-NEXT: s_cbranch_execz .LBB24_2			; GFX1132-NEXT: s_cbranch_execz .LBB24_2
	; GFX1132-NEXT: ; %bb.1:			; GFX1132-NEXT: ; %bb.1:
	; GFX1132-NEXT: v_mov_b32_e32 v0, 5			; GFX1132-NEXT: v_mov_b32_e32 v0, 5
	; GFX1132-NEXT: v_mov_b32_e32 v1, 0			; GFX1132-NEXT: v_mov_b32_e32 v1, 0
	; GFX1132-NEXT: v_mov_b32_e32 v2, 0			; GFX1132-NEXT: v_mov_b32_e32 v2, 0
	; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1132-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1132-NEXT: ds_min_rtn_u64 v[0:1], v2, v[0:1]			; GFX1132-NEXT: ds_min_rtn_u64 v[0:1], v2, v[0:1]
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_gl0_inv			; GFX1132-NEXT: buffer_gl0_inv
	; GFX1132-NEXT: .LBB24_2:			; GFX1132-NEXT: .LBB24_2:
	; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2			; GFX1132-NEXT: s_or_b32 exec_lo, exec_lo, s2
	; GFX1132-NEXT: v_readfirstlane_b32 s2, v0			; GFX1132-NEXT: v_readfirstlane_b32 s2, v0
	; GFX1132-NEXT: v_readfirstlane_b32 s3, v1			; GFX1132-NEXT: v_readfirstlane_b32 s3, v1
	; GFX1132-NEXT: v_cndmask_b32_e64 v1, 0, -1, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v1, 0, -1, vcc_lo
	; GFX1132-NEXT: v_cndmask_b32_e64 v0, 5, -1, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v0, 5, -1, vcc_lo
				; GFX1132-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX1132-NEXT: v_cmp_lt_u64_e32 vcc_lo, s[2:3], v[0:1]			; GFX1132-NEXT: v_cmp_lt_u64_e32 vcc_lo, s[2:3], v[0:1]
	; GFX1132-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v1, v1, s3, vcc_lo
	; GFX1132-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc_lo			; GFX1132-NEXT: v_cndmask_b32_e64 v0, v0, s2, vcc_lo
	; GFX1132-NEXT: s_mov_b32 s2, -1			; GFX1132-NEXT: s_mov_b32 s2, -1
	; GFX1132-NEXT: s_mov_b32 s3, 0x31016000			; GFX1132-NEXT: s_mov_b32 s3, 0x31016000
	; GFX1132-NEXT: s_waitcnt lgkmcnt(0)			; GFX1132-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0			; GFX1132-NEXT: buffer_store_b64 v[0:1], off, s[0:3], 0
	; GFX1132-NEXT: s_endpgm			; GFX1132-NEXT: s_endpgm
	entry:			entry:
	%old = atomicrmw umin i64 addrspace(3)* @local_var64, i64 5 acq_rel			%old = atomicrmw umin i64 addrspace(3)* @local_var64, i64 5 acq_rel
	store i64 %old, i64 addrspace(1)* %out			store i64 %old, i64 addrspace(1)* %out
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/cluster_stores.ll

	Show First 20 Lines • Show All 470 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: cluster_image_sample:			; GFX11-LABEL: cluster_image_sample:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: v_cvt_f32_i32_e32 v8, v0			; GFX11-NEXT: v_cvt_f32_i32_e32 v8, v0
	; GFX11-NEXT: v_cvt_f32_i32_e32 v9, v1			; GFX11-NEXT: v_cvt_f32_i32_e32 v9, v1
	; GFX11-NEXT: v_mov_b32_e32 v4, 0			; GFX11-NEXT: v_mov_b32_e32 v4, 0
	; GFX11-NEXT: v_mov_b32_e32 v10, 1.0			; GFX11-NEXT: v_mov_b32_e32 v10, 1.0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GFX11-NEXT: v_add_f32_e32 v2, 1.0, v8			; GFX11-NEXT: v_add_f32_e32 v2, 1.0, v8
	; GFX11-NEXT: v_add_f32_e32 v3, 1.0, v9			; GFX11-NEXT: v_add_f32_e32 v3, 1.0, v9
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
	; GFX11-NEXT: v_mov_b32_e32 v5, v4			; GFX11-NEXT: v_mov_b32_e32 v5, v4
	; GFX11-NEXT: v_mov_b32_e32 v6, v4			; GFX11-NEXT: v_mov_b32_e32 v6, v4
	; GFX11-NEXT: v_mov_b32_e32 v7, v4			; GFX11-NEXT: v_mov_b32_e32 v7, v4
	; GFX11-NEXT: v_add_f32_e32 v8, 2.0, v8			; GFX11-NEXT: v_add_f32_e32 v8, 2.0, v8
	; GFX11-NEXT: v_add_f32_e32 v9, 2.0, v9			; GFX11-NEXT: v_add_f32_e32 v9, 2.0, v9
	; GFX11-NEXT: v_mov_b32_e32 v11, v10			; GFX11-NEXT: v_mov_b32_e32 v11, v10
	; GFX11-NEXT: v_mov_b32_e32 v12, v10			; GFX11-NEXT: v_mov_b32_e32 v12, v10
	; GFX11-NEXT: v_mov_b32_e32 v13, v10			; GFX11-NEXT: v_mov_b32_e32 v13, v10
	Show All 28 Lines

llvm/test/CodeGen/AMDGPU/dual-source-blend-export.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s -check-prefix=GCN			; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s -check-prefix=GCN

	; This is a slightly modified IR from real case to make it concise.			; This is a slightly modified IR from real case to make it concise.
	define amdgpu_ps void @_amdgpu_ps_main(i32 inreg %PrimMask, <2 x float> %InterpCenter) #0 {			define amdgpu_ps void @_amdgpu_ps_main(i32 inreg %PrimMask, <2 x float> %InterpCenter) #0 {
	; GCN-LABEL: _amdgpu_ps_main:			; GCN-LABEL: _amdgpu_ps_main:
	; GCN: ; %bb.0: ; %.entry			; GCN: ; %bb.0: ; %.entry
	; GCN-NEXT: s_mov_b32 s1, exec_lo			; GCN-NEXT: s_mov_b32 s1, exec_lo
	; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo			; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GCN-NEXT: s_mov_b32 m0, s0			; GCN-NEXT: s_mov_b32 m0, s0
	; GCN-NEXT: v_mov_b32_e32 v2, v0			; GCN-NEXT: v_mov_b32_e32 v2, v0
	; GCN-NEXT: lds_param_load v3, attr1.x wait_vdst:15			; GCN-NEXT: lds_param_load v3, attr1.x wait_vdst:15
	; GCN-NEXT: lds_param_load v4, attr1.y wait_vdst:15			; GCN-NEXT: lds_param_load v4, attr1.y wait_vdst:15
	; GCN-NEXT: lds_param_load v5, attr1.z wait_vdst:15			; GCN-NEXT: lds_param_load v5, attr1.z wait_vdst:15
	; GCN-NEXT: lds_param_load v6, attr1.w wait_vdst:15			; GCN-NEXT: lds_param_load v6, attr1.w wait_vdst:15
	; GCN-NEXT: v_mbcnt_lo_u32_b32 v7, -1, 0			; GCN-NEXT: v_mbcnt_lo_u32_b32 v7, -1, 0
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GCN-NEXT: v_mbcnt_hi_u32_b32 v7, -1, v7			; GCN-NEXT: v_mbcnt_hi_u32_b32 v7, -1, v7
	; GCN-NEXT: v_and_b32_e32 v7, 1, v7			; GCN-NEXT: v_and_b32_e32 v7, 1, v7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_4) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v7			; GCN-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v7
	; GCN-NEXT: v_interp_p10_f32 v8, v4, v2, v4 wait_exp:2			; GCN-NEXT: v_interp_p10_f32 v8, v4, v2, v4 wait_exp:2
	; GCN-NEXT: v_interp_p10_f32 v10, v5, v2, v5 wait_exp:1			; GCN-NEXT: v_interp_p10_f32 v10, v5, v2, v5 wait_exp:1
	; GCN-NEXT: v_interp_p10_f32 v9, v6, v2, v6			; GCN-NEXT: v_interp_p10_f32 v9, v6, v2, v6
	; GCN-NEXT: v_interp_p10_f32 v2, v3, v2, v3 wait_exp:7			; GCN-NEXT: v_interp_p10_f32 v2, v3, v2, v3 wait_exp:7
	; GCN-NEXT: v_interp_p2_f32 v4, v4, v1, v8 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v4, v4, v1, v8 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v5, v5, v1, v10 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v5, v5, v1, v10 wait_exp:7
	; GCN-NEXT: v_interp_p2_f32 v6, v6, v1, v9 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v6, v6, v1, v9 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v2, v3, v1, v2 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v2, v3, v1, v2 wait_exp:7
	; GCN-NEXT: v_mov_b32_dpp v4, v4 dpp8:[1,0,3,2,5,4,7,6]			; GCN-NEXT: v_mov_b32_dpp v4, v4 dpp8:[1,0,3,2,5,4,7,6]
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GCN-NEXT: v_mov_b32_dpp v6, v6 dpp8:[1,0,3,2,5,4,7,6]			; GCN-NEXT: v_mov_b32_dpp v6, v6 dpp8:[1,0,3,2,5,4,7,6]
	; GCN-NEXT: v_cndmask_b32_e32 v3, v4, v5, vcc_lo			; GCN-NEXT: v_cndmask_b32_e32 v3, v4, v5, vcc_lo
	; GCN-NEXT: v_cndmask_b32_e32 v4, v5, v4, vcc_lo			; GCN-NEXT: v_cndmask_b32_e32 v4, v5, v4, vcc_lo
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)
	; GCN-NEXT: v_cndmask_b32_e32 v5, v2, v6, vcc_lo			; GCN-NEXT: v_cndmask_b32_e32 v5, v2, v6, vcc_lo
	; GCN-NEXT: v_cndmask_b32_e32 v2, v6, v2, vcc_lo			; GCN-NEXT: v_cndmask_b32_e32 v2, v6, v2, vcc_lo
	; GCN-NEXT: v_mov_b32_dpp v4, v4 dpp8:[1,0,3,2,5,4,7,6]			; GCN-NEXT: v_mov_b32_dpp v4, v4 dpp8:[1,0,3,2,5,4,7,6]
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_3)
	; GCN-NEXT: v_mov_b32_dpp v5, v5 dpp8:[1,0,3,2,5,4,7,6]			; GCN-NEXT: v_mov_b32_dpp v5, v5 dpp8:[1,0,3,2,5,4,7,6]
	; GCN-NEXT: s_mov_b32 exec_lo, s1			; GCN-NEXT: s_mov_b32 exec_lo, s1
	; GCN-NEXT: exp dual_src_blend0 v3, v2, off, off			; GCN-NEXT: exp dual_src_blend0 v3, v2, off, off
	; GCN-NEXT: exp dual_src_blend1 v4, v5, off, off done			; GCN-NEXT: exp dual_src_blend1 v4, v5, off, off done
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	%InterpCenter.i0 = extractelement <2 x float> %InterpCenter, i64 0			%InterpCenter.i0 = extractelement <2 x float> %InterpCenter, i64 0
	%InterpCenter.i1 = extractelement <2 x float> %InterpCenter, i64 1			%InterpCenter.i1 = extractelement <2 x float> %InterpCenter, i64 1
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/flat-scratch-svs.ll

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	;			;
	; GFX11-GISEL-LABEL: soff1_voff1:			; GFX11-GISEL-LABEL: soff1_voff1:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GFX11-GISEL-LABEL: soff1_voff2:			; GFX11-GISEL-LABEL: soff1_voff2:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 1, v0			; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GFX11-GISEL-LABEL: soff1_voff4:			; GFX11-GISEL-LABEL: soff1_voff4:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GFX11-SDAG-LABEL: soff2_voff1:			; GFX11-SDAG-LABEL: soff2_voff1:
	; GFX11-SDAG: ; %bb.0: ; %bb			; GFX11-SDAG: ; %bb.0: ; %bb
	; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1			; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2			; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 4			; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 1			; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 1
				; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-SDAG-NEXT: v_add3_u32 v0, 4, s0, v0			; GFX11-SDAG-NEXT: v_add3_u32 v0, 4, s0, v0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, off offset:1 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, off offset:1 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v2, off offset:2 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v2, off offset:2 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: s_endpgm			; GFX11-SDAG-NEXT: s_endpgm
	;			;
	; GFX11-GISEL-LABEL: soff2_voff1:			; GFX11-GISEL-LABEL: soff2_voff1:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 1			; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 1
				; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GFX11-SDAG: ; %bb.0: ; %bb			; GFX11-SDAG: ; %bb.0: ; %bb
	; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-SDAG-NEXT: v_lshlrev_b32_e32 v0, 1, v0			; GFX11-SDAG-NEXT: v_lshlrev_b32_e32 v0, 1, v0
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1			; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2			; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 4			; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 1			; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 1
				; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-SDAG-NEXT: v_add3_u32 v0, 4, s0, v0			; GFX11-SDAG-NEXT: v_add3_u32 v0, 4, s0, v0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, off offset:1 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, off offset:1 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v2, off offset:2 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v2, off offset:2 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: s_endpgm			; GFX11-SDAG-NEXT: s_endpgm
	;			;
	; GFX11-GISEL-LABEL: soff2_voff2:			; GFX11-GISEL-LABEL: soff2_voff2:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 1, v0			; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 1			; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 1
				; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GFX11-SDAG: ; %bb.0: ; %bb			; GFX11-SDAG: ; %bb.0: ; %bb
	; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1			; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1
	; GFX11-SDAG-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-SDAG-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2			; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 4			; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 1			; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 1
				; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 4			; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 4
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, s0 offset:1 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, s0 offset:1 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v2, s0 offset:2 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v2, s0 offset:2 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v3, s0 offset:4 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v3, s0 offset:4 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: s_endpgm			; GFX11-SDAG-NEXT: s_endpgm
	;			;
	; GFX11-GISEL-LABEL: soff2_voff4:			; GFX11-GISEL-LABEL: soff2_voff4:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 1			; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 1
				; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; GFX11-SDAG-LABEL: soff4_voff1:			; GFX11-SDAG-LABEL: soff4_voff1:
	; GFX11-SDAG: ; %bb.0: ; %bb			; GFX11-SDAG: ; %bb.0: ; %bb
	; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1			; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 2			; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 2
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v4, 4			; GFX11-SDAG-NEXT: v_mov_b32_e32 v4, 4
	; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2
				; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-SDAG-NEXT: v_add3_u32 v2, 4, s0, v0			; GFX11-SDAG-NEXT: v_add3_u32 v2, 4, s0, v0
	; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 4			; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 4
	; GFX11-SDAG-NEXT: scratch_store_b8 v2, v1, off offset:1 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v2, v1, off offset:1 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v2, v3, off offset:2 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v2, v3, off offset:2 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v4, s0 offset:4 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v4, s0 offset:4 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: s_endpgm			; GFX11-SDAG-NEXT: s_endpgm
	;			;
	; GFX11-GISEL-LABEL: soff4_voff1:			; GFX11-GISEL-LABEL: soff4_voff1:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 2
				; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GFX11-SDAG: ; %bb.0: ; %bb			; GFX11-SDAG: ; %bb.0: ; %bb
	; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-SDAG-NEXT: v_lshlrev_b32_e32 v0, 1, v0			; GFX11-SDAG-NEXT: v_lshlrev_b32_e32 v0, 1, v0
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1			; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2			; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v4, 4			; GFX11-SDAG-NEXT: v_mov_b32_e32 v4, 4
	; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2
				; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-SDAG-NEXT: v_add3_u32 v3, 4, s0, v0			; GFX11-SDAG-NEXT: v_add3_u32 v3, 4, s0, v0
	; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 4			; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 4
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, s0 offset:1 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, s0 offset:1 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v3, v2, off offset:2 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v3, v2, off offset:2 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v4, s0 offset:4 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v4, s0 offset:4 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: s_endpgm			; GFX11-SDAG-NEXT: s_endpgm
	;			;
	; GFX11-GISEL-LABEL: soff4_voff2:			; GFX11-GISEL-LABEL: soff4_voff2:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 1, v0			; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 2
				; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GFX11-SDAG: ; %bb.0: ; %bb			; GFX11-SDAG: ; %bb.0: ; %bb
	; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-SDAG-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1			; GFX11-SDAG-NEXT: v_mov_b32_e32 v1, 1
	; GFX11-SDAG-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-SDAG-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2			; GFX11-SDAG-NEXT: v_mov_b32_e32 v2, 2
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 4			; GFX11-SDAG-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-SDAG-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-SDAG-NEXT: s_lshl_b32 s0, s0, 2
				; GFX11-SDAG-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 4			; GFX11-SDAG-NEXT: s_add_i32 s0, s0, 4
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, s0 offset:1 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v1, s0 offset:1 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v2, s0 offset:2 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v2, s0 offset:2 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: scratch_store_b8 v0, v3, s0 offset:4 dlc			; GFX11-SDAG-NEXT: scratch_store_b8 v0, v3, s0 offset:4 dlc
	; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-SDAG-NEXT: s_endpgm			; GFX11-SDAG-NEXT: s_endpgm
	;			;
	; GFX11-GISEL-LABEL: soff4_voff4:			; GFX11-GISEL-LABEL: soff4_voff4:
	; GFX11-GISEL: ; %bb.0: ; %bb			; GFX11-GISEL: ; %bb.0: ; %bb
	; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24			; GFX11-GISEL-NEXT: s_load_b32 s0, s[0:1], 0x24
	; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-GISEL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1			; GFX11-GISEL-NEXT: v_mov_b32_e32 v2, 1
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4			; GFX11-GISEL-NEXT: v_mov_b32_e32 v3, 4
	; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-GISEL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 2			; GFX11-GISEL-NEXT: s_lshl_b32 s0, s0, 2
				; GFX11-GISEL-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4			; GFX11-GISEL-NEXT: v_add_nc_u32_e64 v1, s0, 4
	; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0			; GFX11-GISEL-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 2
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v2, off offset:1 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v1, off offset:2 dlc
	; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc			; GFX11-GISEL-NEXT: scratch_store_b8 v0, v3, off offset:4 dlc
	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/flat-scratch.ll

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:48			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:48
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:32			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:32
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:16			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:16
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: zero_init_kernel:			; GFX11-LABEL: zero_init_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_mov_b32_e32 v0, s0			; GFX11-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-NEXT: v_mov_b32_e32 v1, s1			; GFX11-NEXT: v_mov_b32_e32 v1, s1
	; GFX11-NEXT: v_mov_b32_e32 v2, s2			; GFX11-NEXT: v_mov_b32_e32 v2, s2
	; GFX11-NEXT: v_mov_b32_e32 v3, s3			; GFX11-NEXT: v_mov_b32_e32 v3, s3
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:48			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:48
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:32			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:32
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:16			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:16
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: zero_init_kernel:			; GFX11-PAL-LABEL: zero_init_kernel:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
				; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX11-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX11-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX11-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX11-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX11-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: zero_init_foo:			; GFX11-LABEL: zero_init_foo:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_mov_b32_e32 v0, s0			; GFX11-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-NEXT: v_mov_b32_e32 v1, s1			; GFX11-NEXT: v_mov_b32_e32 v1, s1
	; GFX11-NEXT: v_mov_b32_e32 v2, s2			; GFX11-NEXT: v_mov_b32_e32 v2, s2
	; GFX11-NEXT: v_mov_b32_e32 v3, s3			; GFX11-NEXT: v_mov_b32_e32 v3, s3
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-PAL-LABEL: zero_init_foo:			; GFX11-PAL-LABEL: zero_init_foo:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
				; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX11-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX11-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX11-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX11-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX11-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	▲ Show 20 Lines • Show All 361 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: store_load_vindex_kernel:			; GFX11-LABEL: store_load_vindex_kernel:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-NEXT: v_mov_b32_e32 v1, 15			; GFX11-NEXT: v_mov_b32_e32 v1, 15
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-NEXT: v_sub_nc_u32_e32 v2, 4, v0			; GFX11-NEXT: v_sub_nc_u32_e32 v2, 4, v0
	; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:4 dlc			; GFX11-NEXT: scratch_store_b32 v0, v1, off offset:4 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_kernel:			; GFX9-PAL-LABEL: store_load_vindex_kernel:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; GFX10-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, v0, off offset:124 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_endpgm			; GFX10-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: store_load_vindex_kernel:			; GFX11-PAL-LABEL: store_load_vindex_kernel:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX11-PAL-NEXT: v_mov_b32_e32 v1, 15
				; GFX11-PAL-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-PAL-NEXT: v_sub_nc_u32_e32 v2, 4, v0			; GFX11-PAL-NEXT: v_sub_nc_u32_e32 v2, 4, v0
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:4 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v1, off offset:4 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v2, off offset:124 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_endpgm			; GFX11-PAL-NEXT: s_endpgm
	; GCN-LABEL: store_load_vindex_kernel:			; GCN-LABEL: store_load_vindex_kernel:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	;			;
	; GFX11-LABEL: store_load_vindex_foo:			; GFX11-LABEL: store_load_vindex_foo:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_and_b32_e32 v1, 15, v0			; GFX11-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-NEXT: v_mov_b32_e32 v2, 15			; GFX11-NEXT: v_mov_b32_e32 v2, 15
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3)
	; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX11-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX11-NEXT: scratch_store_b32 v0, v2, s32 dlc			; GFX11-NEXT: scratch_store_b32 v0, v2, s32 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, v1, s32 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, v1, s32 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_foo:			; GFX9-PAL-LABEL: store_load_vindex_foo:
	Show All 39 Lines
	;			;
	; GFX11-PAL-LABEL: store_load_vindex_foo:			; GFX11-PAL-LABEL: store_load_vindex_foo:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: v_and_b32_e32 v1, 15, v0			; GFX11-PAL-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v2, 15			; GFX11-PAL-NEXT: v_mov_b32_e32 v2, 15
				; GFX11-PAL-NEXT: s_delay_alu instid0(VALU_DEP_3)
	; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX11-PAL-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX11-PAL-NEXT: scratch_store_b32 v0, v2, s32 dlc			; GFX11-PAL-NEXT: scratch_store_b32 v0, v2, s32 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s32 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, v1, s32 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX11-PAL-NEXT: s_setpc_b64 s[30:31]
	; GCN-LABEL: store_load_vindex_foo:			; GCN-LABEL: store_load_vindex_foo:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:320			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:320
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: zero_init_small_offset_kernel:			; GFX11-LABEL: zero_init_small_offset_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_mov_b32_e32 v0, s0			; GFX11-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-NEXT: v_mov_b32_e32 v1, s1			; GFX11-NEXT: v_mov_b32_e32 v1, s1
	; GFX11-NEXT: v_mov_b32_e32 v2, s2			; GFX11-NEXT: v_mov_b32_e32 v2, s2
	; GFX11-NEXT: v_mov_b32_e32 v3, s3			; GFX11-NEXT: v_mov_b32_e32 v3, s3
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:320			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], off offset:320
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: zero_init_small_offset_kernel:			; GFX11-PAL-LABEL: zero_init_small_offset_kernel:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, off offset:4 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
				; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX11-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX11-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX11-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX11-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX11-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	;			;
	; GFX11-LABEL: zero_init_small_offset_foo:			; GFX11-LABEL: zero_init_small_offset_foo:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b32 v0, off, s32 glc dlc			; GFX11-NEXT: scratch_load_b32 v0, off, s32 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, 0			; GFX11-NEXT: s_mov_b32 s0, 0
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s1, s0			; GFX11-NEXT: s_mov_b32 s1, s0
	; GFX11-NEXT: s_mov_b32 s2, s0			; GFX11-NEXT: s_mov_b32 s2, s0
	; GFX11-NEXT: s_mov_b32 s3, s0			; GFX11-NEXT: s_mov_b32 s3, s0
	; GFX11-NEXT: v_mov_b32_e32 v0, s0			; GFX11-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-NEXT: v_mov_b32_e32 v1, s1			; GFX11-NEXT: v_mov_b32_e32 v1, s1
	; GFX11-NEXT: v_mov_b32_e32 v2, s2			; GFX11-NEXT: v_mov_b32_e32 v2, s2
	; GFX11-NEXT: v_mov_b32_e32 v3, s3			; GFX11-NEXT: v_mov_b32_e32 v3, s3
	; GFX11-NEXT: s_clause 0x3			; GFX11-NEXT: s_clause 0x3
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	;			;
	; GFX11-PAL-LABEL: zero_init_small_offset_foo:			; GFX11-PAL-LABEL: zero_init_small_offset_foo:
	; GFX11-PAL: ; %bb.0:			; GFX11-PAL: ; %bb.0:
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s32 glc dlc			; GFX11-PAL-NEXT: scratch_load_b32 v0, off, s32 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: s_mov_b32 s0, 0			; GFX11-PAL-NEXT: s_mov_b32 s0, 0
				; GFX11-PAL-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-PAL-NEXT: s_mov_b32 s1, s0			; GFX11-PAL-NEXT: s_mov_b32 s1, s0
	; GFX11-PAL-NEXT: s_mov_b32 s2, s0			; GFX11-PAL-NEXT: s_mov_b32 s2, s0
	; GFX11-PAL-NEXT: s_mov_b32 s3, s0			; GFX11-PAL-NEXT: s_mov_b32 s3, s0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX11-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX11-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX11-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX11-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX11-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX11-PAL-NEXT: s_clause 0x3			; GFX11-PAL-NEXT: s_clause 0x3
	▲ Show 20 Lines • Show All 2,901 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v1			; GFX10-NEXT: ; use v1
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX11-LABEL: large_offset:			; GFX11-LABEL: large_offset:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mov_b32_e32 v1, v0			; GFX11-NEXT: v_mov_b32_e32 v1, v0
	; GFX11-NEXT: v_mov_b32_e32 v2, v0			; GFX11-NEXT: v_mov_b32_e32 v2, v0
	; GFX11-NEXT: v_mov_b32_e32 v3, v0			; GFX11-NEXT: v_mov_b32_e32 v3, v0
	; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:3024 dlc			; GFX11-NEXT: scratch_store_b128 off, v[0:3], off offset:3024 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: scratch_load_b128 v[0:3], off, off offset:3024 glc dlc			; GFX11-NEXT: scratch_load_b128 v[0:3], off, off offset:3024 glc dlc
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, 16			; GFX11-NEXT: v_mov_b32_e32 v0, 16
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; GFX10-PAL-NEXT: ;;#ASMSTART			; GFX10-PAL-NEXT: ;;#ASMSTART
	; GFX10-PAL-NEXT: ; use v1			; GFX10-PAL-NEXT: ; use v1
	; GFX10-PAL-NEXT: ;;#ASMEND			; GFX10-PAL-NEXT: ;;#ASMEND
	; GFX10-PAL-NEXT: s_endpgm			; GFX10-PAL-NEXT: s_endpgm
	;			;
	; GFX11-PAL-LABEL: large_offset:			; GFX11-PAL-LABEL: large_offset:
	; GFX11-PAL: ; %bb.0: ; %bb			; GFX11-PAL: ; %bb.0: ; %bb
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 0			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 0
				; GFX11-PAL-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v1, v0			; GFX11-PAL-NEXT: v_mov_b32_e32 v1, v0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v2, v0			; GFX11-PAL-NEXT: v_mov_b32_e32 v2, v0
	; GFX11-PAL-NEXT: v_mov_b32_e32 v3, v0			; GFX11-PAL-NEXT: v_mov_b32_e32 v3, v0
	; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:3024 dlc			; GFX11-PAL-NEXT: scratch_store_b128 off, v[0:3], off offset:3024 dlc
	; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-PAL-NEXT: scratch_load_b128 v[0:3], off, off offset:3024 glc dlc			; GFX11-PAL-NEXT: scratch_load_b128 v[0:3], off, off offset:3024 glc dlc
	; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX11-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 16			; GFX11-PAL-NEXT: v_mov_b32_e32 v0, 16
	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/insert-delay-alu.mir

This file was added.

				# RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=-wavefrontsize32,+wavefrontsize64 -verify-machineinstrs -start-before=amdgpu-insert-delay-alu %s -o - \| FileCheck %s

				---
				name: valu_dep_1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_1:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: valu_dep_2
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_2:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_2)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: valu_dep_3
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_3:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				; CHECK-NEXT: v_add_nc_u32_e32 v2, v2, v2
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_3)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				$vgpr2 = V_ADD_U32_e32 $vgpr2, $vgpr2, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: valu_dep_4
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_4:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				; CHECK-NEXT: v_add_nc_u32_e32 v2, v2, v2
				; CHECK-NEXT: v_add_nc_u32_e32 v3, v3, v3
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_4)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				$vgpr2 = V_ADD_U32_e32 $vgpr2, $vgpr2, implicit $exec
				$vgpr3 = V_ADD_U32_e32 $vgpr3, $vgpr3, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				# There's no encoding for VALU_DEP_5. A normal VALU instruction will have
				# completed already.
				---
				name: valu_dep_5
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_5:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				; CHECK-NEXT: v_add_nc_u32_e32 v2, v2, v2
				; CHECK-NEXT: v_add_nc_u32_e32 v3, v3, v3
				; CHECK-NEXT: v_add_nc_u32_e32 v4, v4, v4
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				$vgpr2 = V_ADD_U32_e32 $vgpr2, $vgpr2, implicit $exec
				$vgpr3 = V_ADD_U32_e32 $vgpr3, $vgpr3, implicit $exec
				$vgpr4 = V_ADD_U32_e32 $vgpr4, $vgpr4, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: trans32_dep_1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}trans32_dep_1:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_exp_f32_e32 v0, v0
				; CHECK-NEXT: s_delay_alu instid0(TRANS32_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_EXP_F32_e32 $vgpr0, implicit $exec, implicit $mode
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: trans32_dep_2
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}trans32_dep_2:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_exp_f32_e32 v0, v0
				; CHECK-NEXT: v_exp_f32_e32 v1, v1
				; CHECK-NEXT: s_delay_alu instid0(TRANS32_DEP_2)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_EXP_F32_e32 $vgpr0, implicit $exec, implicit $mode
				$vgpr1 = V_EXP_F32_e32 $vgpr1, implicit $exec, implicit $mode
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: trans32_dep_3
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}trans32_dep_3:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_exp_f32_e32 v0, v0
				; CHECK-NEXT: v_exp_f32_e32 v1, v1
				; CHECK-NEXT: v_exp_f32_e32 v2, v2
				; CHECK-NEXT: s_delay_alu instid0(TRANS32_DEP_3)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_EXP_F32_e32 $vgpr0, implicit $exec, implicit $mode
				$vgpr1 = V_EXP_F32_e32 $vgpr1, implicit $exec, implicit $mode
				$vgpr2 = V_EXP_F32_e32 $vgpr2, implicit $exec, implicit $mode
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				# There's no encoding for TRANS32_DEP_4. A normal TRANS instruction will have
				# completed already.
				---
				name: trans32_dep_4
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}trans32_dep_4:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_exp_f32_e32 v0, v0
				; CHECK-NEXT: v_exp_f32_e32 v1, v1
				; CHECK-NEXT: v_exp_f32_e32 v2, v2
				; CHECK-NEXT: v_exp_f32_e32 v3, v3
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_EXP_F32_e32 $vgpr0, implicit $exec, implicit $mode
				$vgpr1 = V_EXP_F32_e32 $vgpr1, implicit $exec, implicit $mode
				$vgpr2 = V_EXP_F32_e32 $vgpr2, implicit $exec, implicit $mode
				$vgpr3 = V_EXP_F32_e32 $vgpr3, implicit $exec, implicit $mode
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: salu_cycle_1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}salu_cycle_1:
				; CHECK: %bb.0:
				; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, s0, v0
				$sgpr0 = S_MOV_B32 0
				$vgpr0 = V_ADD_U32_e32 $sgpr0, $vgpr0, implicit $exec
				...

				# There's no need for SALU_CYCLE_2 here because the s_mov will have completed
				# already.
				---
				name: salu_cycle_2
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}salu_cycle_2:
				; CHECK: %bb.0:
				; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				; CHECK-NEXT: v_add_nc_u32_e32 v0, s0, v0
				$sgpr0 = S_MOV_B32 0
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $sgpr0, $vgpr0, implicit $exec
				...

				---
				name: valu_dep_1_same_trans32_dep_1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_1_same_trans32_dep_1:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_exp_f32_e32 v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				; CHECK-NEXT: s_delay_alu instid0(TRANS32_DEP_1) \| instid1(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v1
				$vgpr0 = V_EXP_F32_e32 $vgpr0, implicit $exec, implicit $mode
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr1, implicit $exec
				...

				# There's no need to encode the VALU depdendency because it will complete before
				# the TRANS.
				---
				name: trans32_dep_1_only
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}trans32_dep_1_only:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_exp_f32_e32 v1, v1
				; CHECK-NEXT: s_delay_alu instid0(TRANS32_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v1
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr1 = V_EXP_F32_e32 $vgpr1, implicit $exec, implicit $mode
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr1, implicit $exec
				...

				---
				name: valu_dep_1_same_salu_cycle_1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_1_same_salu_cycle_1:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: s_mov_b32 s0, 0
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instid1(SALU_CYCLE_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, s0, v0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$sgpr0 = S_MOV_B32 0
				$vgpr0 = V_ADD_U32_e32 $sgpr0, $vgpr0, implicit $exec
				...

				---
				name: valu_dep_1_next_valu_dep_1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_1_next_valu_dep_1:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: valu_dep_2_next_valu_dep_2
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_2_next_valu_dep_2:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				...

				# There's no need to encode a dependency for the second mul, because the
				# dependency for the first mul has already guaranteed that the add has
				# completed.
				---
				name: valu_dep_1_no_next_1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_1_no_next_1:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_f32_e32 v0, v0, v0
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_mul_f32_e32 v1, v0, v0
				; CHECK-NEXT: v_mul_f32_e32 v2, v0, v0
				$vgpr0 = V_ADD_F32_e32 $vgpr0, $vgpr0, implicit $exec, implicit $mode
				$vgpr1 = V_MUL_F32_e32 $vgpr0, $vgpr0, implicit $exec, implicit $mode
				$vgpr2 = V_MUL_F32_e32 $vgpr0, $vgpr0, implicit $exec, implicit $mode
				...

				# There's no need to encode a dependency for the second add, because the
				# dependency for the second mul has already guaranteed that a later VALU has
				# completed.
				---
				name: valu_dep_1_no_next_2
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_1_no_next_2:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_f32_e32 v0, v0, v0
				; CHECK-NEXT: v_mul_f32_e32 v1, v1, v1
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_mul_f32_e32 v1, v1, v1
				; CHECK-NEXT: v_add_f32_e32 v0, v0, v0
				$vgpr0 = V_ADD_F32_e32 $vgpr0, $vgpr0, implicit $exec, implicit $mode
				$vgpr1 = V_MUL_F32_e32 $vgpr1, $vgpr1, implicit $exec, implicit $mode
				$vgpr1 = V_MUL_F32_e32 $vgpr1, $vgpr1, implicit $exec, implicit $mode
				$vgpr0 = V_ADD_F32_e32 $vgpr0, $vgpr0, implicit $exec, implicit $mode
				...

				# There are no wait states between an add/sub/cmp generating carry and an
				# add/sub/cndmask that consumes it, so no need to encode a dependency.

				---
				name: implicit_cmp_cndmask
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}implicit_cmp_cndmask:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_cmp_eq_i32_e32 vcc, v0, v1
				; CHECK-NEXT: v_cndmask_b32_e64 v2, v3, v4, vcc
				implicit $vcc = V_CMP_EQ_I32_e32 $vgpr0, $vgpr1, implicit $exec
				$vgpr2 = V_CNDMASK_B32_e64 0, $vgpr3, 0, $vgpr4, $vcc, implicit $exec
				...

				# TODO: There should be no s_delay_alu here.
				---
				name: explicit_cmp_cndmask
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}explicit_cmp_cndmask:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_cmp_eq_i32_e64 s[0:1], v0, v1
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_cndmask_b32_e64 v2, v3, v4, s[0:1]
				$sgpr0_sgpr1 = V_CMP_EQ_I32_e64 $vgpr0, $vgpr1, implicit $exec
				$vgpr2 = V_CNDMASK_B32_e64 0, $vgpr3, 0, $vgpr4, $sgpr0_sgpr1, implicit $exec
				...

				---
				name: implicit_addc_addc
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}implicit_addc_addc:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_co_ci_u32_e32 v0, vcc, v0, v0, vcc
				; CHECK-NEXT: v_add_co_ci_u32_e32 v1, vcc, v1, v1, vcc
				$vgpr0 = V_ADDC_U32_e32 $vgpr0, $vgpr0, implicit-def $vcc, implicit $vcc, implicit $exec
				$vgpr1 = V_ADDC_U32_e32 $vgpr1, $vgpr1, implicit-def $vcc, implicit $vcc, implicit $exec
				...

				---
				name: explicit_addc_addc
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}explicit_addc_addc:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_co_u32 v0, vcc, v0, v0
				; CHECK-NEXT: v_add_co_ci_u32_e32 v1, vcc, v1, v1, vcc
				$vgpr0,$vcc = V_ADD_CO_U32_e64 $vgpr0, $vgpr0, 0, implicit $exec
				$vgpr1 = V_ADDC_U32_e32 $vgpr1, $vgpr1, implicit-def $vcc, implicit $vcc, implicit $exec
				...

				---
				name: valu_dep_3_bundle
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}valu_dep_3_bundle:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v1, v1
				; CHECK-NEXT: v_add_nc_u32_e32 v2, v2, v2
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_3)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				BUNDLE {
				$vgpr1 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				$vgpr2 = V_ADD_U32_e32 $vgpr2, $vgpr2, implicit $exec
				}
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: if
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}if:
				; CHECK: %bb.0:
				; CHECK-NEXT: s_cbranch_vccz .LBB23_2
				; CHECK-NEXT: %bb.1:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: .LBB23_2:
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				S_CBRANCH_VCCZ %bb.2, implicit $vcc
				bb.1:
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				bb.2:
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: else
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}else:
				; CHECK: %bb.0:
				; CHECK-NEXT: s_cbranch_vccz .LBB24_2
				; CHECK-NEXT: %bb.1
				; CHECK-NEXT: s_branch .LBB24_3
				; CHECK-NEXT: .LBB24_2:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: .LBB24_3:
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				S_CBRANCH_VCCZ %bb.2, implicit $vcc
				bb.1:
				S_BRANCH %bb.3
				bb.2:
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				bb.3:
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				---
				name: if_else
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}if_else:
				; CHECK: %bb.0:
				; CHECK-NEXT: s_cbranch_vccz .LBB25_2
				; CHECK-NEXT: %bb.1:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: s_branch .LBB25_3
				; CHECK-NEXT: .LBB25_2:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v1, v1
				; CHECK-NEXT: .LBB25_3:
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				S_CBRANCH_VCCZ %bb.2, implicit $vcc
				bb.1:
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				S_BRANCH %bb.3
				bb.2:
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				$vgpr0 = V_ADD_U32_e32 $vgpr1, $vgpr1, implicit $exec
				bb.3:
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				# Dependency from outside the loop.
				---
				name: loop_1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}loop_1:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: .LBB26_1:
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v1, v0, v0
				; CHECK-NEXT: s_cbranch_vccz .LBB26_1
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				bb.1:
				$vgpr1 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				S_CBRANCH_VCCZ %bb.1, implicit $vcc
				bb.2:
				...

				# Dependency from inside the loop.
				---
				name: loop_2
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}loop_2:
				; CHECK: %bb.0:
				; CHECK-NEXT: .LBB27_1:
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				; CHECK-NEXT: s_cbranch_vccz .LBB27_1
				bb.1:
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				S_CBRANCH_VCCZ %bb.1, implicit $vcc
				bb.2:
				...

				# No VALU delay across s_sendmsg_rtn because it waits for all outstanding VALU
				# to complete.
				---
				name: sendmsg_rtn
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}sendmsg_rtn:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_mov_b32_e32 v0, 0
				; CHECK-NEXT: s_sendmsg_rtn_b32 s0, sendmsg(MSG_RTN_GET_DOORBELL)
				; CHECK-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
				; CHECK-NEXT: s_add_u32 s0, s0, s0
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_MOV_B32_e32 0, implicit $exec
				$sgpr0 = S_SENDMSG_RTN_B32 128
				$sgpr0 = S_ADD_U32 $sgpr0, $sgpr0, implicit-def $scc
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				# No VALU delay before or across FLAT because it waits for all outstanding VALU
				# to complete.
				---
				name: flat_load
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}flat_load:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_mov_b32_e32 v0, 0
				; CHECK-NEXT: v_mov_b32_e32 v1, 0
				; CHECK-NEXT: v_mov_b32_e32 v2, 0
				; CHECK-NEXT: flat_load_b32 v0, v[0:1]
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v2, v2
				$vgpr0 = V_MOV_B32_e32 0, implicit $exec
				$vgpr1 = V_MOV_B32_e32 0, implicit $exec
				$vgpr2 = V_MOV_B32_e32 0, implicit $exec
				$vgpr0 = FLAT_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec, implicit $flat_scr
				$vgpr0 = V_ADD_U32_e32 $vgpr2, $vgpr2, implicit $exec
				...

				# No VALU delay across an s_waitcnt_depctr that waits for all outstanding VALU
				# to complete.
				---
				name: waitcnt_depctr
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}waitcnt_depctr:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_mov_b32_e32 v0, 0
				; CHECK-NEXT: s_waitcnt_depctr 0xfff
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_MOV_B32_e32 0, implicit $exec
				S_WAITCNT_DEPCTR 4095
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

				# Check that no delays are emitted for writelane instructions.
				---
				name: writelane1
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}writelane1:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_writelane_b32 v0, s0, 0
				; CHECK-NEXT: v_writelane_b32 v0, s0, 1
				; CHECK-NEXT: v_writelane_b32 v0, s0, 2
				; CHECK-NEXT: v_writelane_b32 v0, s0, 3
				$vgpr0 = V_WRITELANE_B32 $sgpr0, 0, $vgpr0
				$vgpr0 = V_WRITELANE_B32 $sgpr0, 1, $vgpr0
				$vgpr0 = V_WRITELANE_B32 $sgpr0, 2, $vgpr0
				$vgpr0 = V_WRITELANE_B32 $sgpr0, 3, $vgpr0
				...

				# Check if a VALU delay is added after writelane.
				---
				name: writelane2
				body: \|
				bb.0:
				; CHECK-LABEL: {{^}}writelane2:
				; CHECK: %bb.0:
				; CHECK-NEXT: v_writelane_b32 v0, s0, 3
				; CHECK-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; CHECK-NEXT: v_add_nc_u32_e32 v0, v0, v0
				$vgpr0 = V_WRITELANE_B32 $sgpr0, 3, $vgpr0
				$vgpr0 = V_ADD_U32_e32 $vgpr0, $vgpr0, implicit $exec
				...

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 382 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Machine Natural Loop Construction			; GCN-O1-NEXT: Machine Natural Loop Construction
	; GCN-O1-NEXT: MachinePostDominator Tree Construction			; GCN-O1-NEXT: MachinePostDominator Tree Construction
	; GCN-O1-NEXT: SI insert wait instructions			; GCN-O1-NEXT: SI insert wait instructions
	; GCN-O1-NEXT: Insert required mode register values			; GCN-O1-NEXT: Insert required mode register values
	; GCN-O1-NEXT: SI Insert Hard Clauses			; GCN-O1-NEXT: SI Insert Hard Clauses
	; GCN-O1-NEXT: SI Final Branch Preparation			; GCN-O1-NEXT: SI Final Branch Preparation
	; GCN-O1-NEXT: SI peephole optimizations			; GCN-O1-NEXT: SI peephole optimizations
	; GCN-O1-NEXT: Post RA hazard recognizer			; GCN-O1-NEXT: Post RA hazard recognizer
				; GCN-O1-NEXT: AMDGPU Insert Delay ALU
	; GCN-O1-NEXT: Branch relaxation pass			; GCN-O1-NEXT: Branch relaxation pass
	; GCN-O1-NEXT: Register Usage Information Collector Pass			; GCN-O1-NEXT: Register Usage Information Collector Pass
	; GCN-O1-NEXT: Live DEBUG_VALUE analysis			; GCN-O1-NEXT: Live DEBUG_VALUE analysis
	; GCN-O1-NEXT: Function register usage analysis			; GCN-O1-NEXT: Function register usage analysis
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-NEXT: Machine Optimization Remark Emitter			; GCN-O1-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-NEXT: AMDGPU Assembly Printer			; GCN-O1-NEXT: AMDGPU Assembly Printer
	▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Machine Natural Loop Construction			; GCN-O1-OPTS-NEXT: Machine Natural Loop Construction
	; GCN-O1-OPTS-NEXT: MachinePostDominator Tree Construction			; GCN-O1-OPTS-NEXT: MachinePostDominator Tree Construction
	; GCN-O1-OPTS-NEXT: SI insert wait instructions			; GCN-O1-OPTS-NEXT: SI insert wait instructions
	; GCN-O1-OPTS-NEXT: Insert required mode register values			; GCN-O1-OPTS-NEXT: Insert required mode register values
	; GCN-O1-OPTS-NEXT: SI Insert Hard Clauses			; GCN-O1-OPTS-NEXT: SI Insert Hard Clauses
	; GCN-O1-OPTS-NEXT: SI Final Branch Preparation			; GCN-O1-OPTS-NEXT: SI Final Branch Preparation
	; GCN-O1-OPTS-NEXT: SI peephole optimizations			; GCN-O1-OPTS-NEXT: SI peephole optimizations
	; GCN-O1-OPTS-NEXT: Post RA hazard recognizer			; GCN-O1-OPTS-NEXT: Post RA hazard recognizer
				; GCN-O1-OPTS-NEXT: AMDGPU Insert Delay ALU
	; GCN-O1-OPTS-NEXT: Branch relaxation pass			; GCN-O1-OPTS-NEXT: Branch relaxation pass
	; GCN-O1-OPTS-NEXT: Register Usage Information Collector Pass			; GCN-O1-OPTS-NEXT: Register Usage Information Collector Pass
	; GCN-O1-OPTS-NEXT: Live DEBUG_VALUE analysis			; GCN-O1-OPTS-NEXT: Live DEBUG_VALUE analysis
	; GCN-O1-OPTS-NEXT: Function register usage analysis			; GCN-O1-OPTS-NEXT: Function register usage analysis
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter			; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-OPTS-NEXT: AMDGPU Assembly Printer			; GCN-O1-OPTS-NEXT: AMDGPU Assembly Printer
	▲ Show 20 Lines • Show All 274 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Machine Natural Loop Construction			; GCN-O2-NEXT: Machine Natural Loop Construction
	; GCN-O2-NEXT: MachinePostDominator Tree Construction			; GCN-O2-NEXT: MachinePostDominator Tree Construction
	; GCN-O2-NEXT: SI insert wait instructions			; GCN-O2-NEXT: SI insert wait instructions
	; GCN-O2-NEXT: Insert required mode register values			; GCN-O2-NEXT: Insert required mode register values
	; GCN-O2-NEXT: SI Insert Hard Clauses			; GCN-O2-NEXT: SI Insert Hard Clauses
	; GCN-O2-NEXT: SI Final Branch Preparation			; GCN-O2-NEXT: SI Final Branch Preparation
	; GCN-O2-NEXT: SI peephole optimizations			; GCN-O2-NEXT: SI peephole optimizations
	; GCN-O2-NEXT: Post RA hazard recognizer			; GCN-O2-NEXT: Post RA hazard recognizer
				; GCN-O2-NEXT: AMDGPU Insert Delay ALU
	; GCN-O2-NEXT: Branch relaxation pass			; GCN-O2-NEXT: Branch relaxation pass
	; GCN-O2-NEXT: Register Usage Information Collector Pass			; GCN-O2-NEXT: Register Usage Information Collector Pass
	; GCN-O2-NEXT: Live DEBUG_VALUE analysis			; GCN-O2-NEXT: Live DEBUG_VALUE analysis
	; GCN-O2-NEXT: Function register usage analysis			; GCN-O2-NEXT: Function register usage analysis
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O2-NEXT: Machine Optimization Remark Emitter			; GCN-O2-NEXT: Machine Optimization Remark Emitter
	; GCN-O2-NEXT: AMDGPU Assembly Printer			; GCN-O2-NEXT: AMDGPU Assembly Printer
	▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Machine Natural Loop Construction			; GCN-O3-NEXT: Machine Natural Loop Construction
	; GCN-O3-NEXT: MachinePostDominator Tree Construction			; GCN-O3-NEXT: MachinePostDominator Tree Construction
	; GCN-O3-NEXT: SI insert wait instructions			; GCN-O3-NEXT: SI insert wait instructions
	; GCN-O3-NEXT: Insert required mode register values			; GCN-O3-NEXT: Insert required mode register values
	; GCN-O3-NEXT: SI Insert Hard Clauses			; GCN-O3-NEXT: SI Insert Hard Clauses
	; GCN-O3-NEXT: SI Final Branch Preparation			; GCN-O3-NEXT: SI Final Branch Preparation
	; GCN-O3-NEXT: SI peephole optimizations			; GCN-O3-NEXT: SI peephole optimizations
	; GCN-O3-NEXT: Post RA hazard recognizer			; GCN-O3-NEXT: Post RA hazard recognizer
				; GCN-O3-NEXT: AMDGPU Insert Delay ALU
	; GCN-O3-NEXT: Branch relaxation pass			; GCN-O3-NEXT: Branch relaxation pass
	; GCN-O3-NEXT: Register Usage Information Collector Pass			; GCN-O3-NEXT: Register Usage Information Collector Pass
	; GCN-O3-NEXT: Live DEBUG_VALUE analysis			; GCN-O3-NEXT: Live DEBUG_VALUE analysis
	; GCN-O3-NEXT: Function register usage analysis			; GCN-O3-NEXT: Function register usage analysis
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O3-NEXT: Machine Optimization Remark Emitter			; GCN-O3-NEXT: Machine Optimization Remark Emitter
	; GCN-O3-NEXT: AMDGPU Assembly Printer			; GCN-O3-NEXT: AMDGPU Assembly Printer
	; GCN-O3-NEXT: Free MachineFunction			; GCN-O3-NEXT: Free MachineFunction
	; GCN-O3-NEXT:Pass Arguments: -domtree			; GCN-O3-NEXT:Pass Arguments: -domtree
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction

	define void @empty() {			define void @empty() {
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.exp.row.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	}			}

	; Divergent row number just causes a readfirstlane for now.			; Divergent row number just causes a readfirstlane for now.
	define amdgpu_kernel void @id_row_i32() #0 {			define amdgpu_kernel void @id_row_i32() #0 {
	; GFX11-SDAG-LABEL: id_row_i32:			; GFX11-SDAG-LABEL: id_row_i32:
	; GFX11-SDAG: ; %bb.0:			; GFX11-SDAG: ; %bb.0:
	; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v0			; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v0
	; GFX11-SDAG-NEXT: v_mov_b32_e32 v0, 0x63			; GFX11-SDAG-NEXT: v_mov_b32_e32 v0, 0x63
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-SDAG-NEXT: s_mov_b32 m0, s0			; GFX11-SDAG-NEXT: s_mov_b32 m0, s0
	; GFX11-SDAG-NEXT: exp pos0 v0, off, off, off done row_en			; GFX11-SDAG-NEXT: exp pos0 v0, off, off, off done row_en
	; GFX11-SDAG-NEXT: s_endpgm			; GFX11-SDAG-NEXT: s_endpgm
	;			;
	; GFX11-GISEL-LABEL: id_row_i32:			; GFX11-GISEL-LABEL: id_row_i32:
	; GFX11-GISEL: ; %bb.0:			; GFX11-GISEL: ; %bb.0:
	; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 0x63			; GFX11-GISEL-NEXT: v_mov_b32_e32 v1, 0x63
	; GFX11-GISEL-NEXT: v_readfirstlane_b32 m0, v0			; GFX11-GISEL-NEXT: v_readfirstlane_b32 m0, v0
	; GFX11-GISEL-NEXT: exp pos0 v1, off, off, off done row_en			; GFX11-GISEL-NEXT: exp pos0 v1, off, off, off done row_en
	; GFX11-GISEL-NEXT: s_endpgm			; GFX11-GISEL-NEXT: s_endpgm
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	call void @llvm.amdgcn.exp.row.i32(i32 12, i32 1, i32 99, i32 undef, i32 undef, i32 undef, i1 true, i32 %id)			call void @llvm.amdgcn.exp.row.i32(i32 12, i32 1, i32 99, i32 undef, i32 undef, i32 undef, i1 true, i32 %id)
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll

	Show All 9 Lines
	; GFX11-NEXT: s_load_b256 s[0:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b256 s[0:7], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_load_b32 s6, s[6:7], 0x0			; GFX11-NEXT: s_load_b32 s6, s[6:7], 0x0
	; GFX11-NEXT: s_load_b32 s2, s[2:3], 0x0			; GFX11-NEXT: s_load_b32 s2, s[2:3], 0x0
	; GFX11-NEXT: s_load_b32 s3, s[4:5], 0x0			; GFX11-NEXT: s_load_b32 s3, s[4:5], 0x0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, s6			; GFX11-NEXT: v_mov_b32_e32 v0, s6
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_dot2_f32_bf16 v0, s2, s3, v0 clamp			; GFX11-NEXT: v_dot2_f32_bf16 v0, s2, s3, v0 clamp
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	float addrspace(1)* %r,			float addrspace(1)* %r,
	<2 x i16> addrspace(1)* %a,			<2 x i16> addrspace(1)* %a,
	<2 x i16> addrspace(1)* %b,			<2 x i16> addrspace(1)* %b,
	float addrspace(1)* %c) {			float addrspace(1)* %c) {
	entry:			entry:
	Show All 12 Lines
	; GFX11-NEXT: s_load_b256 s[0:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b256 s[0:7], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_load_b32 s6, s[6:7], 0x0			; GFX11-NEXT: s_load_b32 s6, s[6:7], 0x0
	; GFX11-NEXT: s_load_b32 s2, s[2:3], 0x0			; GFX11-NEXT: s_load_b32 s2, s[2:3], 0x0
	; GFX11-NEXT: s_load_b32 s3, s[4:5], 0x0			; GFX11-NEXT: s_load_b32 s3, s[4:5], 0x0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, s6			; GFX11-NEXT: v_mov_b32_e32 v0, s6
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_dot2_f32_bf16 v0, s2, s3, v0			; GFX11-NEXT: v_dot2_f32_bf16 v0, s2, s3, v0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	float addrspace(1)* %r,			float addrspace(1)* %r,
	<2 x i16> addrspace(1)* %a,			<2 x i16> addrspace(1)* %a,
	<2 x i16> addrspace(1)* %b,			<2 x i16> addrspace(1)* %b,
	float addrspace(1)* %c) {			float addrspace(1)* %c) {
	entry:			entry:
	%a.val = load <2 x i16>, <2 x i16> addrspace(1)* %a			%a.val = load <2 x i16>, <2 x i16> addrspace(1)* %a
	%b.val = load <2 x i16>, <2 x i16> addrspace(1)* %b			%b.val = load <2 x i16>, <2 x i16> addrspace(1)* %b
	%c.val = load float, float addrspace(1)* %c			%c.val = load float, float addrspace(1)* %c
	%r.val = call float @llvm.amdgcn.fdot2.f32.bf16(<2 x i16> %a.val, <2 x i16> %b.val, float %c.val, i1 0)			%r.val = call float @llvm.amdgcn.fdot2.f32.bf16(<2 x i16> %a.val, <2 x i16> %b.val, float %c.val, i1 0)
	store float %r.val, float addrspace(1)* %r			store float %r.val, float addrspace(1)* %r
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefixes=VERDE %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefixes=VERDE %s
	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefixes=FIJI %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefixes=FIJI %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX6789 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX6789 %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-enable-prt-strict-null -verify-machineinstrs < %s \| FileCheck -check-prefixes=NOPRT %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-enable-prt-strict-null -verify-machineinstrs < %s \| FileCheck -check-prefixes=NOPRT %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

	define amdgpu_ps <4 x float> @load_1d(<8 x i32> inreg %rsrc, i32 %s) {			define amdgpu_ps <4 x float> @load_1d(<8 x i32> inreg %rsrc, i32 %s) {
	; VERDE-LABEL: load_1d:			; VERDE-LABEL: load_1d:
	; VERDE: ; %bb.0: ; %main_body			; VERDE: ; %bb.0: ; %main_body
	; VERDE-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm			; VERDE-NEXT: image_load v[0:3], v0, s[0:7] dmask:0xf unorm
	; VERDE-NEXT: s_waitcnt vmcnt(0)			; VERDE-NEXT: s_waitcnt vmcnt(0)
	; VERDE-NEXT: ; return to shader part epilog			; VERDE-NEXT: ; return to shader part epilog
	;			;
	▲ Show 20 Lines • Show All 3,831 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_ps <4 x float> @gather4_2d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; GFX9-LABEL: gather4_2d:			; GFX9-LABEL: gather4_2d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0			; GFX9-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0			; GFX9-NEXT: v_lshl_or_b32 v0, v1, 16, v0
	▲ Show 20 Lines • Show All 378 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; GFX9-LABEL: sample_1d:			; GFX9-LABEL: sample_1d:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_mov_b64 s[12:13], exec			; GFX9-NEXT: s_mov_b64 s[12:13], exec
	; GFX9-NEXT: s_wqm_b64 exec, exec			; GFX9-NEXT: s_wqm_b64 exec, exec
	; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]			; GFX9-NEXT: s_and_b64 exec, exec, s[12:13]
	; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16			; GFX9-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf a16
	▲ Show 20 Lines • Show All 1,037 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck -check-prefixes=TONGA %s			; RUN: llc < %s -march=amdgcn -mcpu=tonga -verify-machineinstrs \| FileCheck -check-prefixes=TONGA %s
	; RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs \| FileCheck -check-prefixes=GFX81 %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx810 -verify-machineinstrs \| FileCheck -check-prefixes=GFX81 %s
	; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx900 -verify-machineinstrs \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s
	; RUN: llc < %s -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

	define amdgpu_ps half @image_sample_2d_f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {			define amdgpu_ps half @image_sample_2d_f16(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
	; TONGA-LABEL: image_sample_2d_f16:			; TONGA-LABEL: image_sample_2d_f16:
	; TONGA: ; %bb.0: ; %main_body			; TONGA: ; %bb.0: ; %main_body
	; TONGA-NEXT: s_mov_b64 s[12:13], exec			; TONGA-NEXT: s_mov_b64 s[12:13], exec
	; TONGA-NEXT: s_wqm_b64 exec, exec			; TONGA-NEXT: s_wqm_b64 exec, exec
	; TONGA-NEXT: s_and_b64 exec, exec, s[12:13]			; TONGA-NEXT: s_and_b64 exec, exec, s[12:13]
	; TONGA-NEXT: image_sample v0, v[0:1], s[0:7], s[8:11] dmask:0x1 d16			; TONGA-NEXT: image_sample v0, v[0:1], s[0:7], s[8:11] dmask:0x1 d16
	▲ Show 20 Lines • Show All 469 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefixes=VERDE %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefixes=VERDE %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX6789 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX6789 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX10 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

	define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_ps <4 x float> @sample_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	; VERDE-LABEL: sample_1d:			; VERDE-LABEL: sample_1d:
	; VERDE: ; %bb.0: ; %main_body			; VERDE: ; %bb.0: ; %main_body
	; VERDE-NEXT: s_mov_b64 s[12:13], exec			; VERDE-NEXT: s_mov_b64 s[12:13], exec
	; VERDE-NEXT: s_wqm_b64 exec, exec			; VERDE-NEXT: s_wqm_b64 exec, exec
	; VERDE-NEXT: s_and_b64 exec, exec, s[12:13]			; VERDE-NEXT: s_and_b64 exec, exec, s[12:13]
	; VERDE-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf			; VERDE-NEXT: image_sample v[0:3], v0, s[0:7], s[8:11] dmask:0xf
	▲ Show 20 Lines • Show All 2,184 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.encode.ll

	Show All 28 Lines
	; GFX10-NEXT: image_sample_d_g16 v[0:3], [v0, v2, v4, v5], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x0b,0x0f,0x88,0xf0,0x00,0x00,0x40,0x00,0x02,0x04,0x05,0x00]			; GFX10-NEXT: image_sample_d_g16 v[0:3], [v0, v2, v4, v5], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x0b,0x0f,0x88,0xf0,0x00,0x00,0x40,0x00,0x02,0x04,0x05,0x00]
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	;			;
	; GFX11-LABEL: sample_d_2d:			; GFX11-LABEL: sample_d_2d:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v2 ; encoding: [0xff,0x04,0x04,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v2 ; encoding: [0xff,0x04,0x04,0x36,0xff,0xff,0x00,0x00]
	; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v0 ; encoding: [0xff,0x00,0x00,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v0 ; encoding: [0xff,0x00,0x00,0x36,0xff,0xff,0x00,0x00]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2) ; encoding: [0x12,0x01,0x87,0xbf]
	; GFX11-NEXT: v_lshl_or_b32 v2, v3, 16, v2 ; encoding: [0x02,0x00,0x56,0xd6,0x03,0x21,0x09,0x04]			; GFX11-NEXT: v_lshl_or_b32 v2, v3, 16, v2 ; encoding: [0x02,0x00,0x56,0xd6,0x03,0x21,0x09,0x04]
	; GFX11-NEXT: v_lshl_or_b32 v0, v1, 16, v0 ; encoding: [0x00,0x00,0x56,0xd6,0x01,0x21,0x01,0x04]			; GFX11-NEXT: v_lshl_or_b32 v0, v1, 16, v0 ; encoding: [0x00,0x00,0x56,0xd6,0x01,0x21,0x01,0x04]
	; GFX11-NEXT: image_sample_d_g16 v[0:3], [v0, v2, v4, v5], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x05,0x0f,0xe4,0xf0,0x00,0x00,0x00,0x08,0x02,0x04,0x05,0x00]			; GFX11-NEXT: image_sample_d_g16 v[0:3], [v0, v2, v4, v5], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x05,0x0f,0xe4,0xf0,0x00,0x00,0x00,0x08,0x02,0x04,0x05,0x00]
	; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]			; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f32(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f32(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	;			;
	; GFX11-LABEL: sample_d_3d:			; GFX11-LABEL: sample_d_3d:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: v_mov_b32_e32 v9, v3 ; encoding: [0x03,0x03,0x12,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v9, v3 ; encoding: [0x03,0x03,0x12,0x7e]
	; GFX11-NEXT: v_mov_b32_e32 v3, v2 ; encoding: [0x02,0x03,0x06,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v3, v2 ; encoding: [0x02,0x03,0x06,0x7e]
	; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v0 ; encoding: [0xff,0x00,0x00,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v0 ; encoding: [0xff,0x00,0x00,0x36,0xff,0xff,0x00,0x00]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_1) ; encoding: [0x93,0x00,0x87,0xbf]
	; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v9 ; encoding: [0xff,0x12,0x04,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v9 ; encoding: [0xff,0x12,0x04,0x36,0xff,0xff,0x00,0x00]
	; GFX11-NEXT: v_lshl_or_b32 v4, v4, 16, v2 ; encoding: [0x04,0x00,0x56,0xd6,0x04,0x21,0x09,0x04]			; GFX11-NEXT: v_lshl_or_b32 v4, v4, 16, v2 ; encoding: [0x04,0x00,0x56,0xd6,0x04,0x21,0x09,0x04]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) ; encoding: [0x03,0x00,0x87,0xbf]
	; GFX11-NEXT: v_lshl_or_b32 v2, v1, 16, v0 ; encoding: [0x02,0x00,0x56,0xd6,0x01,0x21,0x01,0x04]			; GFX11-NEXT: v_lshl_or_b32 v2, v1, 16, v0 ; encoding: [0x02,0x00,0x56,0xd6,0x01,0x21,0x01,0x04]
	; GFX11-NEXT: image_sample_d_g16 v[0:3], v[2:8], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_3D ; encoding: [0x08,0x0f,0xe4,0xf0,0x02,0x00,0x00,0x08]			; GFX11-NEXT: image_sample_d_g16 v[0:3], v[2:8], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_3D ; encoding: [0x08,0x0f,0xe4,0xf0,0x02,0x00,0x00,0x08]
	; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]			; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f32(i32 15, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, float %s, float %t, float %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f32(i32 15, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, float %s, float %t, float %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	}			}
	Show All 25 Lines
	; GFX10-NEXT: image_sample_c_d_g16 v[0:3], [v0, v1, v3, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x0b,0x0f,0xa8,0xf0,0x00,0x00,0x40,0x00,0x01,0x03,0x05,0x06]			; GFX10-NEXT: image_sample_c_d_g16 v[0:3], [v0, v1, v3, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x0b,0x0f,0xa8,0xf0,0x00,0x00,0x40,0x00,0x01,0x03,0x05,0x06]
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	;			;
	; GFX11-LABEL: sample_c_d_2d:			; GFX11-LABEL: sample_c_d_2d:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: v_and_b32_e32 v3, 0xffff, v3 ; encoding: [0xff,0x06,0x06,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v3, 0xffff, v3 ; encoding: [0xff,0x06,0x06,0x36,0xff,0xff,0x00,0x00]
	; GFX11-NEXT: v_and_b32_e32 v1, 0xffff, v1 ; encoding: [0xff,0x02,0x02,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v1, 0xffff, v1 ; encoding: [0xff,0x02,0x02,0x36,0xff,0xff,0x00,0x00]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2) ; encoding: [0x12,0x01,0x87,0xbf]
	; GFX11-NEXT: v_lshl_or_b32 v3, v4, 16, v3 ; encoding: [0x03,0x00,0x56,0xd6,0x04,0x21,0x0d,0x04]			; GFX11-NEXT: v_lshl_or_b32 v3, v4, 16, v3 ; encoding: [0x03,0x00,0x56,0xd6,0x04,0x21,0x0d,0x04]
	; GFX11-NEXT: v_lshl_or_b32 v1, v2, 16, v1 ; encoding: [0x01,0x00,0x56,0xd6,0x02,0x21,0x05,0x04]			; GFX11-NEXT: v_lshl_or_b32 v1, v2, 16, v1 ; encoding: [0x01,0x00,0x56,0xd6,0x02,0x21,0x05,0x04]
	; GFX11-NEXT: image_sample_c_d_g16 v[0:3], [v0, v1, v3, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x05,0x0f,0xe8,0xf0,0x00,0x00,0x00,0x08,0x01,0x03,0x05,0x06]			; GFX11-NEXT: image_sample_c_d_g16 v[0:3], [v0, v1, v3, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x05,0x0f,0xe8,0xf0,0x00,0x00,0x00,0x08,0x01,0x03,0x05,0x06]
	; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]			; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f16.f32(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f16.f32(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	Show All 26 Lines
	; GFX10-NEXT: image_sample_d_cl_g16 v[0:3], [v0, v2, v4, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x0b,0x0f,0x8c,0xf0,0x00,0x00,0x40,0x00,0x02,0x04,0x05,0x06]			; GFX10-NEXT: image_sample_d_cl_g16 v[0:3], [v0, v2, v4, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x0b,0x0f,0x8c,0xf0,0x00,0x00,0x40,0x00,0x02,0x04,0x05,0x06]
	; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]			; GFX10-NEXT: s_waitcnt vmcnt(0) ; encoding: [0x70,0x3f,0x8c,0xbf]
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	;			;
	; GFX11-LABEL: sample_d_cl_2d:			; GFX11-LABEL: sample_d_cl_2d:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v2 ; encoding: [0xff,0x04,0x04,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v2 ; encoding: [0xff,0x04,0x04,0x36,0xff,0xff,0x00,0x00]
	; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v0 ; encoding: [0xff,0x00,0x00,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v0 ; encoding: [0xff,0x00,0x00,0x36,0xff,0xff,0x00,0x00]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2) ; encoding: [0x12,0x01,0x87,0xbf]
	; GFX11-NEXT: v_lshl_or_b32 v2, v3, 16, v2 ; encoding: [0x02,0x00,0x56,0xd6,0x03,0x21,0x09,0x04]			; GFX11-NEXT: v_lshl_or_b32 v2, v3, 16, v2 ; encoding: [0x02,0x00,0x56,0xd6,0x03,0x21,0x09,0x04]
	; GFX11-NEXT: v_lshl_or_b32 v0, v1, 16, v0 ; encoding: [0x00,0x00,0x56,0xd6,0x01,0x21,0x01,0x04]			; GFX11-NEXT: v_lshl_or_b32 v0, v1, 16, v0 ; encoding: [0x00,0x00,0x56,0xd6,0x01,0x21,0x01,0x04]
	; GFX11-NEXT: image_sample_d_cl_g16 v[0:3], [v0, v2, v4, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x05,0x0f,0x7c,0xf1,0x00,0x00,0x00,0x08,0x02,0x04,0x05,0x06]			; GFX11-NEXT: image_sample_d_cl_g16 v[0:3], [v0, v2, v4, v5, v6], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x05,0x0f,0x7c,0xf1,0x00,0x00,0x00,0x08,0x02,0x04,0x05,0x06]
	; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]			; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f32(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f32(i32 15, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	Show All 30 Lines
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	;			;
	; GFX11-LABEL: sample_c_d_cl_2d:			; GFX11-LABEL: sample_c_d_cl_2d:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: v_mov_b32_e32 v8, v2 ; encoding: [0x02,0x03,0x10,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v8, v2 ; encoding: [0x02,0x03,0x10,0x7e]
	; GFX11-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v3 ; encoding: [0xff,0x06,0x00,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v3 ; encoding: [0xff,0x06,0x00,0x36,0xff,0xff,0x00,0x00]
	; GFX11-NEXT: v_and_b32_e32 v1, 0xffff, v1 ; encoding: [0xff,0x02,0x02,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v1, 0xffff, v1 ; encoding: [0xff,0x02,0x02,0x36,0xff,0xff,0x00,0x00]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2) ; encoding: [0x12,0x01,0x87,0xbf]
	; GFX11-NEXT: v_lshl_or_b32 v4, v4, 16, v0 ; encoding: [0x04,0x00,0x56,0xd6,0x04,0x21,0x01,0x04]			; GFX11-NEXT: v_lshl_or_b32 v4, v4, 16, v0 ; encoding: [0x04,0x00,0x56,0xd6,0x04,0x21,0x01,0x04]
	; GFX11-NEXT: v_lshl_or_b32 v3, v8, 16, v1 ; encoding: [0x03,0x00,0x56,0xd6,0x08,0x21,0x05,0x04]			; GFX11-NEXT: v_lshl_or_b32 v3, v8, 16, v1 ; encoding: [0x03,0x00,0x56,0xd6,0x08,0x21,0x05,0x04]
	; GFX11-NEXT: image_sample_c_d_cl_g16 v[0:3], v[2:7], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x04,0x0f,0x50,0xf1,0x02,0x00,0x00,0x08]			; GFX11-NEXT: image_sample_c_d_cl_g16 v[0:3], v[2:7], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_2D ; encoding: [0x04,0x0f,0x50,0xf1,0x02,0x00,0x00,0x08]
	; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]			; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f16.f32(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f16.f32(i32 15, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <4 x float> %v			ret <4 x float> %v
	Show All 17 Lines
	; GFX11-LABEL: sample_c_d_o_2darray_V1:			; GFX11-LABEL: sample_c_d_o_2darray_V1:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: v_mov_b32_e32 v9, v2 ; encoding: [0x02,0x03,0x12,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v9, v2 ; encoding: [0x02,0x03,0x12,0x7e]
	; GFX11-NEXT: v_mov_b32_e32 v10, v3 ; encoding: [0x03,0x03,0x14,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v10, v3 ; encoding: [0x03,0x03,0x14,0x7e]
	; GFX11-NEXT: v_mov_b32_e32 v3, v1 ; encoding: [0x01,0x03,0x06,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v3, v1 ; encoding: [0x01,0x03,0x06,0x7e]
	; GFX11-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v4 ; encoding: [0xff,0x08,0x00,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v4 ; encoding: [0xff,0x08,0x00,0x36,0xff,0xff,0x00,0x00]
	; GFX11-NEXT: v_and_b32_e32 v1, 0xffff, v9 ; encoding: [0xff,0x12,0x02,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v1, 0xffff, v9 ; encoding: [0xff,0x12,0x02,0x36,0xff,0xff,0x00,0x00]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2) ; encoding: [0x12,0x01,0x87,0xbf]
	; GFX11-NEXT: v_lshl_or_b32 v5, v5, 16, v0 ; encoding: [0x05,0x00,0x56,0xd6,0x05,0x21,0x01,0x04]			; GFX11-NEXT: v_lshl_or_b32 v5, v5, 16, v0 ; encoding: [0x05,0x00,0x56,0xd6,0x05,0x21,0x01,0x04]
	; GFX11-NEXT: v_lshl_or_b32 v4, v10, 16, v1 ; encoding: [0x04,0x00,0x56,0xd6,0x0a,0x21,0x05,0x04]			; GFX11-NEXT: v_lshl_or_b32 v4, v10, 16, v1 ; encoding: [0x04,0x00,0x56,0xd6,0x0a,0x21,0x05,0x04]
	; GFX11-NEXT: image_sample_c_d_o_g16 v0, v[2:8], s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_2D_ARRAY ; encoding: [0x14,0x04,0xf0,0xf0,0x02,0x00,0x00,0x08]			; GFX11-NEXT: image_sample_c_d_o_g16 v0, v[2:8], s[0:7], s[8:11] dmask:0x4 dim:SQ_RSRC_IMG_2D_ARRAY ; encoding: [0x14,0x04,0xf0,0xf0,0x02,0x00,0x00,0x08]
	; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]			; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f16.f32.f32(i32 4, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f16.f32.f32(i32 4, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret float %v			ret float %v
	Show All 17 Lines
	; GFX11-LABEL: sample_c_d_o_2darray_V2:			; GFX11-LABEL: sample_c_d_o_2darray_V2:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: v_mov_b32_e32 v9, v2 ; encoding: [0x02,0x03,0x12,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v9, v2 ; encoding: [0x02,0x03,0x12,0x7e]
	; GFX11-NEXT: v_mov_b32_e32 v10, v3 ; encoding: [0x03,0x03,0x14,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v10, v3 ; encoding: [0x03,0x03,0x14,0x7e]
	; GFX11-NEXT: v_mov_b32_e32 v3, v1 ; encoding: [0x01,0x03,0x06,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v3, v1 ; encoding: [0x01,0x03,0x06,0x7e]
	; GFX11-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]			; GFX11-NEXT: v_mov_b32_e32 v2, v0 ; encoding: [0x00,0x03,0x04,0x7e]
	; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v4 ; encoding: [0xff,0x08,0x00,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v0, 0xffff, v4 ; encoding: [0xff,0x08,0x00,0x36,0xff,0xff,0x00,0x00]
	; GFX11-NEXT: v_and_b32_e32 v1, 0xffff, v9 ; encoding: [0xff,0x12,0x02,0x36,0xff,0xff,0x00,0x00]			; GFX11-NEXT: v_and_b32_e32 v1, 0xffff, v9 ; encoding: [0xff,0x12,0x02,0x36,0xff,0xff,0x00,0x00]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2) ; encoding: [0x12,0x01,0x87,0xbf]
	; GFX11-NEXT: v_lshl_or_b32 v5, v5, 16, v0 ; encoding: [0x05,0x00,0x56,0xd6,0x05,0x21,0x01,0x04]			; GFX11-NEXT: v_lshl_or_b32 v5, v5, 16, v0 ; encoding: [0x05,0x00,0x56,0xd6,0x05,0x21,0x01,0x04]
	; GFX11-NEXT: v_lshl_or_b32 v4, v10, 16, v1 ; encoding: [0x04,0x00,0x56,0xd6,0x0a,0x21,0x05,0x04]			; GFX11-NEXT: v_lshl_or_b32 v4, v10, 16, v1 ; encoding: [0x04,0x00,0x56,0xd6,0x0a,0x21,0x05,0x04]
	; GFX11-NEXT: image_sample_c_d_o_g16 v[0:1], v[2:8], s[0:7], s[8:11] dmask:0x6 dim:SQ_RSRC_IMG_2D_ARRAY ; encoding: [0x14,0x06,0xf0,0xf0,0x02,0x00,0x00,0x08]			; GFX11-NEXT: image_sample_c_d_o_g16 v[0:1], v[2:8], s[0:7], s[8:11] dmask:0x6 dim:SQ_RSRC_IMG_2D_ARRAY ; encoding: [0x14,0x06,0xf0,0xf0,0x02,0x00,0x00,0x08]
	; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]			; GFX11-NEXT: s_waitcnt vmcnt(0) ; encoding: [0xf7,0x03,0x89,0xbf]
	; GFX11-NEXT: ; return to shader part epilog			; GFX11-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%v = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f32(i32 6, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f32(i32 6, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	ret <2 x float> %v			ret <2 x float> %v
	Show All 18 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s) {			define amdgpu_ps <4 x float> @sample_d_1d(<8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s) {
	; GFX10-LABEL: sample_d_1d:			; GFX10-LABEL: sample_d_1d:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: image_sample_d_g16 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D			; GFX10-NEXT: image_sample_d_g16 v[0:3], v[0:2], s[0:7], s[8:11] dmask:0xf dim:SQ_RSRC_IMG_1D
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: ; return to shader part epilog			; GFX10-NEXT: ; return to shader part epilog
	main_body:			main_body:
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.interp.inreg.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	define amdgpu_ps void @v_interp_f32(float inreg %i, float inreg %j, i32 inreg %m0) #0 {			define amdgpu_ps void @v_interp_f32(float inreg %i, float inreg %j, i32 inreg %m0) #0 {
	; GCN-LABEL: v_interp_f32:			; GCN-LABEL: v_interp_f32:
	; GCN: ; %bb.0: ; %main_body			; GCN: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b32 s3, exec_lo			; GCN-NEXT: s_mov_b32 s3, exec_lo
	; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo			; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GCN-NEXT: s_mov_b32 m0, s2			; GCN-NEXT: s_mov_b32 m0, s2
	; GCN-NEXT: lds_param_load v0, attr0.y wait_vdst:15			; GCN-NEXT: lds_param_load v0, attr0.y wait_vdst:15
	; GCN-NEXT: lds_param_load v1, attr1.x wait_vdst:15			; GCN-NEXT: lds_param_load v1, attr1.x wait_vdst:15
	; GCN-NEXT: s_mov_b32 exec_lo, s3			; GCN-NEXT: s_mov_b32 exec_lo, s3
	; GCN-NEXT: v_mov_b32_e32 v2, s0			; GCN-NEXT: v_mov_b32_e32 v2, s0
	; GCN-NEXT: v_mov_b32_e32 v4, s1			; GCN-NEXT: v_mov_b32_e32 v4, s1
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GCN-NEXT: v_interp_p10_f32 v3, v0, v2, v0 wait_exp:1			; GCN-NEXT: v_interp_p10_f32 v3, v0, v2, v0 wait_exp:1
	; GCN-NEXT: v_interp_p10_f32 v2, v1, v2, v1			; GCN-NEXT: v_interp_p10_f32 v2, v1, v2, v1
	; GCN-NEXT: v_interp_p2_f32 v5, v0, v4, v3 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v5, v0, v4, v3 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GCN-NEXT: v_interp_p2_f32 v4, v1, v4, v5 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v4, v1, v4, v5 wait_exp:7
	; GCN-NEXT: exp mrt0 v3, v2, v5, v4 done			; GCN-NEXT: exp mrt0 v3, v2, v5, v4 done
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	main_body:			main_body:
	%p0 = call float @llvm.amdgcn.lds.param.load(i32 1, i32 0, i32 %m0)			%p0 = call float @llvm.amdgcn.lds.param.load(i32 1, i32 0, i32 %m0)
	%p1 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %m0)			%p1 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %m0)
	%p0_0 = call float @llvm.amdgcn.interp.inreg.p10(float %p0, float %i, float %p0)			%p0_0 = call float @llvm.amdgcn.interp.inreg.p10(float %p0, float %i, float %p0)
	%p1_0 = call float @llvm.amdgcn.interp.inreg.p2(float %p0, float %j, float %p0_0)			%p1_0 = call float @llvm.amdgcn.interp.inreg.p2(float %p0, float %j, float %p0_0)
	Show All 11 Lines
	; GCN-NEXT: s_mov_b32 m0, s2			; GCN-NEXT: s_mov_b32 m0, s2
	; GCN-NEXT: lds_param_load v0, attr0.x wait_vdst:15			; GCN-NEXT: lds_param_load v0, attr0.x wait_vdst:15
	; GCN-NEXT: lds_param_load v1, attr1.x wait_vdst:15			; GCN-NEXT: lds_param_load v1, attr1.x wait_vdst:15
	; GCN-NEXT: lds_param_load v2, attr2.x wait_vdst:15			; GCN-NEXT: lds_param_load v2, attr2.x wait_vdst:15
	; GCN-NEXT: lds_param_load v3, attr3.x wait_vdst:15			; GCN-NEXT: lds_param_load v3, attr3.x wait_vdst:15
	; GCN-NEXT: s_mov_b32 exec_lo, s3			; GCN-NEXT: s_mov_b32 exec_lo, s3
	; GCN-NEXT: v_mov_b32_e32 v4, s0			; GCN-NEXT: v_mov_b32_e32 v4, s0
	; GCN-NEXT: v_mov_b32_e32 v5, s1			; GCN-NEXT: v_mov_b32_e32 v5, s1
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_3) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p10_f32 v6, v0, v4, v0 wait_exp:3			; GCN-NEXT: v_interp_p10_f32 v6, v0, v4, v0 wait_exp:3
	; GCN-NEXT: v_interp_p10_f32 v7, v1, v4, v1 wait_exp:2			; GCN-NEXT: v_interp_p10_f32 v7, v1, v4, v1 wait_exp:2
	; GCN-NEXT: v_interp_p10_f32 v8, v2, v4, v2 wait_exp:1			; GCN-NEXT: v_interp_p10_f32 v8, v2, v4, v2 wait_exp:1
	; GCN-NEXT: v_interp_p10_f32 v4, v3, v4, v3			; GCN-NEXT: v_interp_p10_f32 v4, v3, v4, v3
	; GCN-NEXT: v_interp_p2_f32 v6, v0, v5, v6 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v6, v0, v5, v6 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v7, v1, v5, v7 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v7, v1, v5, v7 wait_exp:7
	; GCN-NEXT: v_interp_p2_f32 v8, v2, v5, v8 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v8, v2, v5, v8 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v4, v3, v5, v4 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v4, v3, v5, v4 wait_exp:7
	; GCN-NEXT: exp mrt0 v6, v7, v8, v4 done			; GCN-NEXT: exp mrt0 v6, v7, v8, v4 done
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	main_body:			main_body:
	%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %m0)			%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %m0)
	%p1 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %m0)			%p1 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 1, i32 %m0)
	%p2 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 2, i32 %m0)			%p2 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 2, i32 %m0)
	%p3 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 3, i32 %m0)			%p3 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 3, i32 %m0)
	Show All 21 Lines
	; GCN-NEXT: lds_param_load v4, attr2.x wait_vdst:15			; GCN-NEXT: lds_param_load v4, attr2.x wait_vdst:15
	; GCN-NEXT: lds_param_load v5, attr3.x wait_vdst:15			; GCN-NEXT: lds_param_load v5, attr3.x wait_vdst:15
	; GCN-NEXT: s_mov_b32 exec_lo, s0			; GCN-NEXT: s_mov_b32 exec_lo, s0
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_interp_p10_f32 v6, v2, v0, v2 wait_exp:3			; GCN-NEXT: v_interp_p10_f32 v6, v2, v0, v2 wait_exp:3
	; GCN-NEXT: v_interp_p10_f32 v7, v3, v0, v3 wait_exp:2			; GCN-NEXT: v_interp_p10_f32 v7, v3, v0, v3 wait_exp:2
	; GCN-NEXT: v_interp_p10_f32 v8, v4, v0, v4 wait_exp:1			; GCN-NEXT: v_interp_p10_f32 v8, v4, v0, v4 wait_exp:1
	; GCN-NEXT: v_interp_p10_f32 v0, v5, v0, v5			; GCN-NEXT: v_interp_p10_f32 v0, v5, v0, v5
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v6, v2, v1, v6 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v6, v2, v1, v6 wait_exp:7
	; GCN-NEXT: v_interp_p2_f32 v7, v3, v1, v7 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v7, v3, v1, v7 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GCN-NEXT: v_interp_p2_f32 v8, v4, v1, v8 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v8, v4, v1, v8 wait_exp:7
	; GCN-NEXT: v_interp_p2_f32 v0, v5, v1, v0 wait_exp:7			; GCN-NEXT: v_interp_p2_f32 v0, v5, v1, v0 wait_exp:7
	; GCN-NEXT: exp mrt0 v6, v7, v8, v0 done			; GCN-NEXT: exp mrt0 v6, v7, v8, v0 done
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	main_body:			main_body:
	%i.ptr = getelementptr float, float addrspace(1)* %ptr, i32 1			%i.ptr = getelementptr float, float addrspace(1)* %ptr, i32 1
	%i = load float, float addrspace(1)* %i.ptr, align 4			%i = load float, float addrspace(1)* %i.ptr, align 4
	%j.ptr = getelementptr float, float addrspace(1)* %ptr, i32 2			%j.ptr = getelementptr float, float addrspace(1)* %ptr, i32 2
	Show All 19 Lines
	; GCN: ; %bb.0: ; %main_body			; GCN: ; %bb.0: ; %main_body
	; GCN-NEXT: s_mov_b32 s3, exec_lo			; GCN-NEXT: s_mov_b32 s3, exec_lo
	; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo			; GCN-NEXT: s_wqm_b32 exec_lo, exec_lo
	; GCN-NEXT: s_mov_b32 m0, s2			; GCN-NEXT: s_mov_b32 m0, s2
	; GCN-NEXT: lds_param_load v1, attr0.x wait_vdst:15			; GCN-NEXT: lds_param_load v1, attr0.x wait_vdst:15
	; GCN-NEXT: s_mov_b32 exec_lo, s3			; GCN-NEXT: s_mov_b32 exec_lo, s3
	; GCN-NEXT: v_mov_b32_e32 v0, s0			; GCN-NEXT: v_mov_b32_e32 v0, s0
	; GCN-NEXT: v_mov_b32_e32 v2, s1			; GCN-NEXT: v_mov_b32_e32 v2, s1
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GCN-NEXT: v_interp_p10_f16_f32 v3, v1, v0, v1			; GCN-NEXT: v_interp_p10_f16_f32 v3, v1, v0, v1
	; GCN-NEXT: v_interp_p10_f16_f32 v0, v1, v0, v1 op_sel:[1,0,1,0] wait_exp:7			; GCN-NEXT: v_interp_p10_f16_f32 v0, v1, v0, v1 op_sel:[1,0,1,0] wait_exp:7
	; GCN-NEXT: v_interp_p2_f16_f32 v3, v1, v2, v3 wait_exp:7			; GCN-NEXT: v_interp_p2_f16_f32 v3, v1, v2, v3 wait_exp:7
				; GCN-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GCN-NEXT: v_interp_p2_f16_f32 v0, v1, v2, v0 op_sel:[1,0,0,0] wait_exp:7			; GCN-NEXT: v_interp_p2_f16_f32 v0, v1, v2, v0 op_sel:[1,0,0,0] wait_exp:7
	; GCN-NEXT: v_add_f16_e32 v0, v3, v0			; GCN-NEXT: v_add_f16_e32 v0, v3, v0
	; GCN-NEXT: ; return to shader part epilog			; GCN-NEXT: ; return to shader part epilog
	main_body:			main_body:
	%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %m0)			%p0 = call float @llvm.amdgcn.lds.param.load(i32 0, i32 0, i32 %m0)
	%l_p0 = call float @llvm.amdgcn.interp.inreg.p10.f16(float %p0, float %i, float %p0, i1 0)			%l_p0 = call float @llvm.amdgcn.interp.inreg.p10.f16(float %p0, float %i, float %p0, i1 0)
	%l_p1 = call half @llvm.amdgcn.interp.inreg.p2.f16(float %p0, float %j, float %l_p0, i1 0)			%l_p1 = call half @llvm.amdgcn.interp.inreg.p2.f16(float %p0, float %j, float %l_p0, i1 0)
	%h_p0 = call float @llvm.amdgcn.interp.inreg.p10.f16(float %p0, float %i, float %p0, i1 1)			%h_p0 = call float @llvm.amdgcn.interp.inreg.p10.f16(float %p0, float %i, float %p0, i1 1)
	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll

	Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x34			; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x34
	; GFX11-NEXT: v_mov_b32_e32 v4, 4.0			; GFX11-NEXT: v_mov_b32_e32 v4, 4.0
	; GFX11-NEXT: v_mov_b32_e32 v5, 0x40a00000			; GFX11-NEXT: v_mov_b32_e32 v5, 0x40a00000
	; GFX11-NEXT: v_mov_b32_e32 v6, 0			; GFX11-NEXT: v_mov_b32_e32 v6, 0
	; GFX11-NEXT: v_mov_b32_e32 v7, 1.0			; GFX11-NEXT: v_mov_b32_e32 v7, 1.0
	; GFX11-NEXT: v_mov_b32_e32 v8, 2.0			; GFX11-NEXT: v_mov_b32_e32 v8, 2.0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_add_co_u32 v0, s4, s4, v2			; GFX11-NEXT: v_add_co_u32 v0, s4, s4, v2
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, s5, 0, s4			; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, s5, 0, s4
	; GFX11-NEXT: v_add_co_u32 v2, s4, s6, v2			; GFX11-NEXT: v_add_co_u32 v2, s4, s6, v2
	; GFX11-NEXT: v_add_co_ci_u32_e64 v3, null, s7, 0, s4			; GFX11-NEXT: v_add_co_ci_u32_e64 v3, null, s7, 0, s4
	; GFX11-NEXT: flat_load_b32 v9, v[0:1]			; GFX11-NEXT: flat_load_b32 v9, v[0:1]
	; GFX11-NEXT: flat_load_b32 v10, v[2:3]			; GFX11-NEXT: flat_load_b32 v10, v[2:3]
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x40c00000			; GFX11-NEXT: v_mov_b32_e32 v0, 0x40c00000
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x40e00000			; GFX11-NEXT: v_mov_b32_e32 v1, 0x40e00000
	; GFX11-NEXT: v_mov_b32_e32 v2, 0x41000000			; GFX11-NEXT: v_mov_b32_e32 v2, 0x41000000
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b128 s[4:7], s[0:1], 0x24
	; GFX11-NEXT: v_lshlrev_b32_e32 v2, 2, v0			; GFX11-NEXT: v_lshlrev_b32_e32 v2, 2, v0
	; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x34			; GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x34
	; GFX11-NEXT: v_mov_b32_e32 v4, 1.0			; GFX11-NEXT: v_mov_b32_e32 v4, 1.0
	; GFX11-NEXT: v_mov_b32_e32 v5, 2.0			; GFX11-NEXT: v_mov_b32_e32 v5, 2.0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_add_co_u32 v0, s4, s4, v2			; GFX11-NEXT: v_add_co_u32 v0, s4, s4, v2
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, s5, 0, s4			; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, s5, 0, s4
	; GFX11-NEXT: v_add_co_u32 v2, s4, s6, v2			; GFX11-NEXT: v_add_co_u32 v2, s4, s6, v2
	; GFX11-NEXT: v_add_co_ci_u32_e64 v3, null, s7, 0, s4			; GFX11-NEXT: v_add_co_ci_u32_e64 v3, null, s7, 0, s4
	; GFX11-NEXT: flat_load_b32 v6, v[0:1]			; GFX11-NEXT: flat_load_b32 v6, v[0:1]
	; GFX11-NEXT: flat_load_b32 v7, v[2:3]			; GFX11-NEXT: flat_load_b32 v7, v[2:3]
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x46004200			; GFX11-NEXT: v_mov_b32_e32 v0, 0x46004200
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x47004400			; GFX11-NEXT: v_mov_b32_e32 v1, 0x47004400
	; GFX11-NEXT: v_mov_b32_e32 v2, 0x48004500			; GFX11-NEXT: v_mov_b32_e32 v2, 0x48004500
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_mov_b32_e32 v5, 0x40a00000			; GFX11-NEXT: v_mov_b32_e32 v5, 0x40a00000
	; GFX11-NEXT: v_mov_b32_e32 v6, 0			; GFX11-NEXT: v_mov_b32_e32 v6, 0
	; GFX11-NEXT: v_mov_b32_e32 v7, 1.0			; GFX11-NEXT: v_mov_b32_e32 v7, 1.0
	; GFX11-NEXT: v_mov_b32_e32 v8, 2.0			; GFX11-NEXT: v_mov_b32_e32 v8, 2.0
	; GFX11-NEXT: v_mov_b32_e32 v9, 0xb36211c7			; GFX11-NEXT: v_mov_b32_e32 v9, 0xb36211c7
	; GFX11-NEXT: v_mov_b32_e32 v10, 0x102			; GFX11-NEXT: v_mov_b32_e32 v10, 0x102
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_add_co_u32 v0, s4, s4, v0			; GFX11-NEXT: v_add_co_u32 v0, s4, s4, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, s5, 0, s4			; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, s5, 0, s4
	; GFX11-NEXT: flat_load_b32 v11, v[0:1]			; GFX11-NEXT: flat_load_b32 v11, v[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x40c00000			; GFX11-NEXT: v_mov_b32_e32 v0, 0x40c00000
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x40e00000			; GFX11-NEXT: v_mov_b32_e32 v1, 0x40e00000
	; GFX11-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], [v[9:10], v11, v[6:8], v[3:5], v[0:2]], s[0:3]			; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], [v[9:10], v11, v[6:8], v[3:5], v[0:2]], s[0:3]
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: flat_store_b128 v[0:1], v[0:3]			; GFX11-NEXT: flat_store_b128 v[0:1], v[0:3]
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_mov_b32_e32 v2, 0x48004500			; GFX11-NEXT: v_mov_b32_e32 v2, 0x48004500
	; GFX11-NEXT: v_mov_b32_e32 v3, 0			; GFX11-NEXT: v_mov_b32_e32 v3, 0
	; GFX11-NEXT: v_mov_b32_e32 v4, 1.0			; GFX11-NEXT: v_mov_b32_e32 v4, 1.0
	; GFX11-NEXT: v_mov_b32_e32 v5, 2.0			; GFX11-NEXT: v_mov_b32_e32 v5, 2.0
	; GFX11-NEXT: v_mov_b32_e32 v6, 0xb36211c6			; GFX11-NEXT: v_mov_b32_e32 v6, 0xb36211c6
	; GFX11-NEXT: v_mov_b32_e32 v7, 0x102			; GFX11-NEXT: v_mov_b32_e32 v7, 0x102
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_add_co_u32 v0, s4, s4, v0			; GFX11-NEXT: v_add_co_u32 v0, s4, s4, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, s5, 0, s4			; GFX11-NEXT: v_add_co_ci_u32_e64 v1, null, s5, 0, s4
	; GFX11-NEXT: flat_load_b32 v8, v[0:1]			; GFX11-NEXT: flat_load_b32 v8, v[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x46004200			; GFX11-NEXT: v_mov_b32_e32 v0, 0x46004200
	; GFX11-NEXT: v_mov_b32_e32 v1, 0x47004400			; GFX11-NEXT: v_mov_b32_e32 v1, 0x47004400
	; GFX11-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], [v[6:7], v8, v[3:5], v[0:2]], s[0:3] a16			; GFX11-NEXT: image_bvh64_intersect_ray v[0:3], [v[6:7], v8, v[3:5], v[0:2]], s[0:3] a16
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: flat_store_b128 v[0:1], v[0:3]			; GFX11-NEXT: flat_store_b128 v[0:1], v[0:3]
	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-SDAG %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-SDAG %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-GISEL %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-GISEL %s

	declare i32 @llvm.amdgcn.permlane64(i32)			declare i32 @llvm.amdgcn.permlane64(i32)
	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()

	define amdgpu_kernel void @test_s(i32 addrspace(1)* %out, i32 %src0) {			define amdgpu_kernel void @test_s(i32 addrspace(1)* %out, i32 %src0) {
	; GFX11-LABEL: test_s:			; GFX11-LABEL: test_s:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: s_load_b32 s2, s[0:1], 0x2c			; GFX11-NEXT: s_load_b32 s2, s[0:1], 0x2c
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: v_mov_b32_e32 v0, s2			; GFX11-NEXT: v_mov_b32_e32 v0, s2
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_permlane64_b32 v0, v0			; GFX11-NEXT: v_permlane64_b32 v0, v0
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%v = call i32 @llvm.amdgcn.permlane64(i32 %src0)			%v = call i32 @llvm.amdgcn.permlane64(i32 %src0)
	store i32 %v, i32 addrspace(1)* %out			store i32 %v, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @test_i(i32 addrspace(1)* %out) {			define amdgpu_kernel void @test_i(i32 addrspace(1)* %out) {
	; GFX11-LABEL: test_i:			; GFX11-LABEL: test_i:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v0, 0x63			; GFX11-NEXT: v_mov_b32_e32 v0, 0x63
	; GFX11-NEXT: v_mov_b32_e32 v1, 0			; GFX11-NEXT: v_mov_b32_e32 v1, 0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-NEXT: v_permlane64_b32 v0, v0			; GFX11-NEXT: v_permlane64_b32 v0, v0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]			; GFX11-NEXT: global_store_b32 v1, v0, s[0:1]
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%v = call i32 @llvm.amdgcn.permlane64(i32 99)			%v = call i32 @llvm.amdgcn.permlane64(i32 99)
	store i32 %v, i32 addrspace(1)* %out			store i32 %v, i32 addrspace(1)* %out
	ret void			ret void
	}			}
	Show All 24 Lines

llvm/test/CodeGen/AMDGPU/llvm.mulo.ll

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: umulo_i64_v_v:			; GFX11-LABEL: umulo_i64_v_v:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, v0			; GFX11-NEXT: v_mov_b32_e32 v4, v0
	; GFX11-NEXT: v_mov_b32_e32 v5, v1			; GFX11-NEXT: v_mov_b32_e32 v5, v1
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v4, v2, 0			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v4, v2, 0
	; GFX11-NEXT: v_mad_u64_u32 v[6:7], null, v4, v3, 0			; GFX11-NEXT: v_mad_u64_u32 v[6:7], null, v4, v3, 0
	; GFX11-NEXT: v_mad_u64_u32 v[9:10], null, v5, v2, 0			; GFX11-NEXT: v_mad_u64_u32 v[9:10], null, v5, v2, 0
	; GFX11-NEXT: v_mad_u64_u32 v[11:12], null, v5, v3, 0			; GFX11-NEXT: v_mad_u64_u32 v[11:12], null, v5, v3, 0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(SKIP_2) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_mov_b32_e32 v8, v1			; GFX11-NEXT: v_mov_b32_e32 v8, v1
	; GFX11-NEXT: v_mul_lo_u32 v5, v5, v2			; GFX11-NEXT: v_mul_lo_u32 v5, v5, v2
	; GFX11-NEXT: v_mul_lo_u32 v4, v4, v3			; GFX11-NEXT: v_mul_lo_u32 v4, v4, v3
	; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v8, v6			; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v8, v6
	; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_add3_u32 v1, v1, v4, v5			; GFX11-NEXT: v_add3_u32 v1, v1, v4, v5
	; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v6, v9			; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v6, v9
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_add_co_ci_u32_e32 v2, vcc_lo, v7, v10, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v2, vcc_lo, v7, v10, vcc_lo
	; GFX11-NEXT: v_add_co_ci_u32_e32 v6, vcc_lo, 0, v12, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v6, vcc_lo, 0, v12, vcc_lo
	; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, v11			; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v2, v11
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v6, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, 0, v6, vcc_lo
	; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, 0, v[2:3]			; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, 0, v[2:3]
	; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%umulo = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %x, i64 %y)			%umulo = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %x, i64 %y)
	ret { i64, i1 } %umulo			ret { i64, i1 } %umulo
	}			}
	▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: smulo_i64_v_v:			; GFX11-LABEL: smulo_i64_v_v:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, v0			; GFX11-NEXT: v_mov_b32_e32 v4, v0
	; GFX11-NEXT: v_mov_b32_e32 v5, v1			; GFX11-NEXT: v_mov_b32_e32 v5, v1
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(SKIP_1) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v4, v2, 0			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v4, v2, 0
	; GFX11-NEXT: v_mad_u64_u32 v[6:7], null, v4, v3, 0			; GFX11-NEXT: v_mad_u64_u32 v[6:7], null, v4, v3, 0
	; GFX11-NEXT: v_mad_u64_u32 v[9:10], null, v5, v2, 0			; GFX11-NEXT: v_mad_u64_u32 v[9:10], null, v5, v2, 0
	; GFX11-NEXT: v_mad_i64_i32 v[11:12], null, v5, v3, 0			; GFX11-NEXT: v_mad_i64_i32 v[11:12], null, v5, v3, 0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mov_b32_e32 v8, v1			; GFX11-NEXT: v_mov_b32_e32 v8, v1
	; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v8, v6			; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v8, v6
	; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
	; GFX11-NEXT: v_mul_lo_u32 v8, v5, v2			; GFX11-NEXT: v_mul_lo_u32 v8, v5, v2
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, v9			; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, v9
	; GFX11-NEXT: v_add_co_ci_u32_e32 v6, vcc_lo, v7, v10, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v6, vcc_lo, v7, v10, vcc_lo
	; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v12, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v12, vcc_lo
	; GFX11-NEXT: v_mul_lo_u32 v9, v4, v3			; GFX11-NEXT: v_mul_lo_u32 v9, v4, v3
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, v11			; GFX11-NEXT: v_add_co_u32 v6, vcc_lo, v6, v11
	; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v7, vcc_lo, 0, v7, vcc_lo
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_sub_co_u32 v2, vcc_lo, v6, v2			; GFX11-NEXT: v_sub_co_u32 v2, vcc_lo, v6, v2
	; GFX11-NEXT: v_subrev_co_ci_u32_e32 v10, vcc_lo, 0, v7, vcc_lo			; GFX11-NEXT: v_subrev_co_ci_u32_e32 v10, vcc_lo, 0, v7, vcc_lo
	; GFX11-NEXT: v_cmp_gt_i32_e32 vcc_lo, 0, v5			; GFX11-NEXT: v_cmp_gt_i32_e32 vcc_lo, 0, v5
	; GFX11-NEXT: v_add3_u32 v1, v1, v9, v8			; GFX11-NEXT: v_add3_u32 v1, v1, v9, v8
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GFX11-NEXT: v_cndmask_b32_e32 v6, v6, v2, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e32 v6, v6, v2, vcc_lo
	; GFX11-NEXT: v_cndmask_b32_e32 v5, v7, v10, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e32 v5, v7, v10, vcc_lo
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_ashrrev_i32_e32 v2, 31, v1			; GFX11-NEXT: v_ashrrev_i32_e32 v2, 31, v1
	; GFX11-NEXT: v_sub_co_u32 v4, vcc_lo, v6, v4			; GFX11-NEXT: v_sub_co_u32 v4, vcc_lo, v6, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(VALU_DEP_4)
	; GFX11-NEXT: v_subrev_co_ci_u32_e32 v7, vcc_lo, 0, v5, vcc_lo			; GFX11-NEXT: v_subrev_co_ci_u32_e32 v7, vcc_lo, 0, v5, vcc_lo
	; GFX11-NEXT: v_cmp_gt_i32_e32 vcc_lo, 0, v3			; GFX11-NEXT: v_cmp_gt_i32_e32 vcc_lo, 0, v3
	; GFX11-NEXT: v_mov_b32_e32 v3, v2			; GFX11-NEXT: v_mov_b32_e32 v3, v2
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(SKIP_1) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e32 v5, v5, v7, vcc_lo
	; GFX11-NEXT: v_cndmask_b32_e32 v4, v6, v4, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e32 v4, v6, v4, vcc_lo
	; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, v[4:5], v[2:3]			; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, v[4:5], v[2:3]
	; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%smulo = tail call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %x, i64 %y)			%smulo = tail call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %x, i64 %y)
	ret { i64, i1 } %smulo			ret { i64, i1 } %smulo
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: s_addc_u32 s5, s9, 0			; GFX11-NEXT: s_addc_u32 s5, s9, 0
	; GFX11-NEXT: s_add_u32 s4, s3, s1			; GFX11-NEXT: s_add_u32 s4, s3, s1
	; GFX11-NEXT: s_addc_u32 s5, 0, s5			; GFX11-NEXT: s_addc_u32 s5, 0, s5
	; GFX11-NEXT: s_add_i32 s1, s8, s7			; GFX11-NEXT: s_add_i32 s1, s8, s7
	; GFX11-NEXT: s_mul_i32 s0, s0, s2			; GFX11-NEXT: s_mul_i32 s0, s0, s2
	; GFX11-NEXT: s_add_i32 s1, s1, s6			; GFX11-NEXT: s_add_i32 s1, s1, s6
	; GFX11-NEXT: s_cmp_lg_u64 s[4:5], 0			; GFX11-NEXT: s_cmp_lg_u64 s[4:5], 0
	; GFX11-NEXT: s_cselect_b32 s2, -1, 0			; GFX11-NEXT: s_cselect_b32 s2, -1, 0
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: v_cndmask_b32_e64 v1, s1, 0, s2			; GFX11-NEXT: v_cndmask_b32_e64 v1, s1, 0, s2
	; GFX11-NEXT: v_cndmask_b32_e64 v0, s0, 0, s2			; GFX11-NEXT: v_cndmask_b32_e64 v0, s0, 0, s2
	; GFX11-NEXT: global_store_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_store_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	bb:			bb:
	%umulo = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %x, i64 %y)			%umulo = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %x, i64 %y)
	%mul = extractvalue { i64, i1 } %umulo, 0			%mul = extractvalue { i64, i1 } %umulo, 0
	%overflow = extractvalue { i64, i1 } %umulo, 1			%overflow = extractvalue { i64, i1 } %umulo, 1
	▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: s_sub_u32 s9, s4, s2			; GFX11-NEXT: s_sub_u32 s9, s4, s2
	; GFX11-NEXT: s_subb_u32 s10, s6, 0			; GFX11-NEXT: s_subb_u32 s10, s6, 0
	; GFX11-NEXT: v_mov_b32_e32 v1, s9			; GFX11-NEXT: v_mov_b32_e32 v1, s9
	; GFX11-NEXT: s_cmp_lt_i32 s1, 0			; GFX11-NEXT: s_cmp_lt_i32 s1, 0
	; GFX11-NEXT: v_mov_b32_e32 v0, s10			; GFX11-NEXT: v_mov_b32_e32 v0, s10
	; GFX11-NEXT: s_cselect_b32 vcc_lo, -1, 0			; GFX11-NEXT: s_cselect_b32 vcc_lo, -1, 0
	; GFX11-NEXT: s_cmp_lt_i32 s3, 0			; GFX11-NEXT: s_cmp_lt_i32 s3, 0
	; GFX11-NEXT: v_cndmask_b32_e32 v2, s4, v1, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e32 v2, s4, v1, vcc_lo
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_cndmask_b32_e32 v0, s6, v0, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e32 v0, s6, v0, vcc_lo
	; GFX11-NEXT: v_sub_co_u32 v3, vcc_lo, v2, s0			; GFX11-NEXT: v_sub_co_u32 v3, vcc_lo, v2, s0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2)
	; GFX11-NEXT: v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v0, vcc_lo			; GFX11-NEXT: v_subrev_co_ci_u32_e32 v1, vcc_lo, 0, v0, vcc_lo
	; GFX11-NEXT: s_cselect_b32 vcc_lo, -1, 0			; GFX11-NEXT: s_cselect_b32 vcc_lo, -1, 0
	; GFX11-NEXT: s_add_i32 s1, s8, s7			; GFX11-NEXT: s_add_i32 s1, s8, s7
	; GFX11-NEXT: s_mul_i32 s0, s0, s2			; GFX11-NEXT: s_mul_i32 s0, s0, s2
	; GFX11-NEXT: s_add_i32 s1, s1, s5			; GFX11-NEXT: s_add_i32 s1, s1, s5
	; GFX11-NEXT: v_cndmask_b32_e32 v1, v0, v1, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e32 v1, v0, v1, vcc_lo
	; GFX11-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e32 v0, v2, v3, vcc_lo
	; GFX11-NEXT: s_ashr_i32 s4, s1, 31			; GFX11-NEXT: s_ashr_i32 s4, s1, 31
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s5, s4			; GFX11-NEXT: s_mov_b32 s5, s4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, s[4:5], v[0:1]			; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, s[4:5], v[0:1]
	; GFX11-NEXT: v_cndmask_b32_e64 v1, s1, 0, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e64 v1, s1, 0, vcc_lo
	; GFX11-NEXT: v_cndmask_b32_e64 v0, s0, 0, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e64 v0, s0, 0, vcc_lo
	; GFX11-NEXT: global_store_b64 v[0:1], v[0:1], off			; GFX11-NEXT: global_store_b64 v[0:1], v[0:1], off
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	bb:			bb:
	%umulo = tail call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %x, i64 %y)			%umulo = tail call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %x, i64 %y)
	%mul = extractvalue { i64, i1 } %umulo, 0			%mul = extractvalue { i64, i1 } %umulo, 0
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: smulo_i64_v_4:			; GFX11-LABEL: smulo_i64_v_4:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_lshlrev_b64 v[4:5], 2, v[0:1]			; GFX11-NEXT: v_lshlrev_b64 v[4:5], 2, v[0:1]
	; GFX11-NEXT: v_alignbit_b32 v3, v1, v0, 30			; GFX11-NEXT: v_alignbit_b32 v3, v1, v0, 30
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_ashrrev_i64 v[5:6], 2, v[4:5]			; GFX11-NEXT: v_ashrrev_i64 v[5:6], 2, v[4:5]
	; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, v[5:6], v[0:1]			; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, v[5:6], v[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v0, v4			; GFX11-NEXT: v_mov_b32_e32 v0, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
	; GFX11-NEXT: v_mov_b32_e32 v1, v3			; GFX11-NEXT: v_mov_b32_e32 v1, v3
	; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%umulo = tail call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %i, i64 4)			%umulo = tail call { i64, i1 } @llvm.smul.with.overflow.i64(i64 %i, i64 4)
	ret { i64, i1 } %umulo			ret { i64, i1 } %umulo
	}			}

	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GFX11-LABEL: umulo_i64_v_4:			; GFX11-LABEL: umulo_i64_v_4:
	; GFX11: ; %bb.0: ; %bb			; GFX11: ; %bb.0: ; %bb
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_and_b32_e32 v7, 0x3fffffff, v1			; GFX11-NEXT: v_and_b32_e32 v7, 0x3fffffff, v1
	; GFX11-NEXT: v_mov_b32_e32 v6, v0			; GFX11-NEXT: v_mov_b32_e32 v6, v0
	; GFX11-NEXT: v_lshlrev_b64 v[4:5], 2, v[0:1]			; GFX11-NEXT: v_lshlrev_b64 v[4:5], 2, v[0:1]
	; GFX11-NEXT: v_alignbit_b32 v3, v1, v0, 30			; GFX11-NEXT: v_alignbit_b32 v3, v1, v0, 30
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, v[6:7], v[0:1]			; GFX11-NEXT: v_cmp_ne_u64_e32 vcc_lo, v[6:7], v[0:1]
	; GFX11-NEXT: v_mov_b32_e32 v0, v4			; GFX11-NEXT: v_mov_b32_e32 v0, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_3)
	; GFX11-NEXT: v_mov_b32_e32 v1, v3			; GFX11-NEXT: v_mov_b32_e32 v1, v3
	; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo			; GFX11-NEXT: v_cndmask_b32_e64 v2, 0, 1, vcc_lo
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%umulo = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %i, i64 4)			%umulo = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %i, i64 4)
	ret { i64, i1 } %umulo			ret { i64, i1 } %umulo
	}			}

	declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64)			declare { i64, i1 } @llvm.umul.with.overflow.i64(i64, i64)
	declare { i64, i1 } @llvm.smul.with.overflow.i64(i64, i64)			declare { i64, i1 } @llvm.smul.with.overflow.i64(i64, i64)

llvm/test/CodeGen/AMDGPU/mad_64_32.ll

	Show All 28 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_sextops:			; GFX11-LABEL: mad_i64_i32_sextops:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mov_b32_e32 v5, v0			; GFX11-NEXT: v_mov_b32_e32 v5, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v5, v4, v[2:3]			; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v5, v4, v[2:3]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = sext i32 %arg0 to i64			%sext0 = sext i32 %arg0 to i64
	%sext1 = sext i32 %arg1 to i64			%sext1 = sext i32 %arg1 to i64
	%mul = mul i64 %sext0, %sext1			%mul = mul i64 %sext0, %sext1
	%mad = add i64 %mul, %arg2			%mad = add i64 %mul, %arg2
	ret i64 %mad			ret i64 %mad
	}			}
	Show All 21 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_sextops_commute:			; GFX11-LABEL: mad_i64_i32_sextops_commute:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mov_b32_e32 v5, v0			; GFX11-NEXT: v_mov_b32_e32 v5, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v5, v4, v[2:3]			; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v5, v4, v[2:3]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = sext i32 %arg0 to i64			%sext0 = sext i32 %arg0 to i64
	%sext1 = sext i32 %arg1 to i64			%sext1 = sext i32 %arg1 to i64
	%mul = mul i64 %sext0, %sext1			%mul = mul i64 %sext0, %sext1
	%mad = add i64 %arg2, %mul			%mad = add i64 %arg2, %mul
	ret i64 %mad			ret i64 %mad
	}			}
	Show All 21 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_u64_u32_zextops:			; GFX11-LABEL: mad_u64_u32_zextops:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mov_b32_e32 v5, v0			; GFX11-NEXT: v_mov_b32_e32 v5, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v5, v4, v[2:3]			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v5, v4, v[2:3]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = zext i32 %arg0 to i64			%sext0 = zext i32 %arg0 to i64
	%sext1 = zext i32 %arg1 to i64			%sext1 = zext i32 %arg1 to i64
	%mul = mul i64 %sext0, %sext1			%mul = mul i64 %sext0, %sext1
	%mad = add i64 %mul, %arg2			%mad = add i64 %mul, %arg2
	ret i64 %mad			ret i64 %mad
	}			}
	Show All 21 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_u64_u32_zextops_commute:			; GFX11-LABEL: mad_u64_u32_zextops_commute:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mov_b32_e32 v5, v0			; GFX11-NEXT: v_mov_b32_e32 v5, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v5, v4, v[2:3]			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v5, v4, v[2:3]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = zext i32 %arg0 to i64			%sext0 = zext i32 %arg0 to i64
	%sext1 = zext i32 %arg1 to i64			%sext1 = zext i32 %arg1 to i64
	%mul = mul i64 %sext0, %sext1			%mul = mul i64 %sext0, %sext1
	%mad = add i64 %arg2, %mul			%mad = add i64 %arg2, %mul
	ret i64 %mad			ret i64 %mad
	}			}
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; GFX11-LABEL: mad_i64_i32_sextops_i32_i128:			; GFX11-LABEL: mad_i64_i32_sextops_i32_i128:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mad_u64_u32 v[6:7], null, v0, v1, 0			; GFX11-NEXT: v_mad_u64_u32 v[6:7], null, v0, v1, 0
	; GFX11-NEXT: v_mov_b32_e32 v8, 0			; GFX11-NEXT: v_mov_b32_e32 v8, 0
	; GFX11-NEXT: v_ashrrev_i32_e32 v14, 31, v0			; GFX11-NEXT: v_ashrrev_i32_e32 v14, 31, v0
	; GFX11-NEXT: v_ashrrev_i32_e32 v15, 31, v1			; GFX11-NEXT: v_ashrrev_i32_e32 v15, 31, v1
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mad_u64_u32 v[9:10], null, v14, v1, v[7:8]			; GFX11-NEXT: v_mad_u64_u32 v[9:10], null, v14, v1, v[7:8]
	; GFX11-NEXT: v_mov_b32_e32 v7, v10			; GFX11-NEXT: v_mov_b32_e32 v7, v10
	; GFX11-NEXT: v_mov_b32_e32 v10, v8			; GFX11-NEXT: v_mov_b32_e32 v10, v8
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_mad_u64_u32 v[11:12], null, v0, v15, v[9:10]			; GFX11-NEXT: v_mad_u64_u32 v[11:12], null, v0, v15, v[9:10]
	; GFX11-NEXT: v_mad_i64_i32 v[9:10], null, v1, v14, 0			; GFX11-NEXT: v_mad_i64_i32 v[9:10], null, v1, v14, 0
	; GFX11-NEXT: v_mov_b32_e32 v8, v12			; GFX11-NEXT: v_mov_b32_e32 v8, v12
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_mad_i64_i32 v[12:13], null, v15, v0, v[9:10]			; GFX11-NEXT: v_mad_i64_i32 v[12:13], null, v15, v0, v[9:10]
	; GFX11-NEXT: v_add_co_u32 v7, s0, v7, v8			; GFX11-NEXT: v_add_co_u32 v7, s0, v7, v8
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_add_co_ci_u32_e64 v8, null, 0, 0, s0			; GFX11-NEXT: v_add_co_ci_u32_e64 v8, null, 0, 0, s0
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v14, v15, v[7:8]			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v14, v15, v[7:8]
	; GFX11-NEXT: v_mov_b32_e32 v7, v11			; GFX11-NEXT: v_mov_b32_e32 v7, v11
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_3)
	; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v0, v12			; GFX11-NEXT: v_add_co_u32 v8, vcc_lo, v0, v12
	; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v1, v13, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v9, vcc_lo, v1, v13, vcc_lo
	; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v6, v2			; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v6, v2
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v7, v3, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v7, v3, vcc_lo
	; GFX11-NEXT: v_add_co_ci_u32_e32 v2, vcc_lo, v8, v4, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v2, vcc_lo, v8, v4, vcc_lo
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4)
	; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v9, v5, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v9, v5, vcc_lo
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = sext i32 %arg0 to i128			%sext0 = sext i32 %arg0 to i128
	%sext1 = sext i32 %arg1 to i128			%sext1 = sext i32 %arg1 to i128
	%mul = mul i128 %sext0, %sext1			%mul = mul i128 %sext0, %sext1
	%mad = add i128 %mul, %arg2			%mad = add i128 %mul, %arg2
	ret i128 %mad			ret i128 %mad
	}			}
	Show All 21 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_sextops_i32_i63:			; GFX11-LABEL: mad_i64_i32_sextops_i32_i63:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mov_b32_e32 v5, v0			; GFX11-NEXT: v_mov_b32_e32 v5, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v5, v4, v[2:3]			; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v5, v4, v[2:3]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = sext i32 %arg0 to i63			%sext0 = sext i32 %arg0 to i63
	%sext1 = sext i32 %arg1 to i63			%sext1 = sext i32 %arg1 to i63
	%mul = mul i63 %sext0, %sext1			%mul = mul i63 %sext0, %sext1
	%mad = add i63 %mul, %arg2			%mad = add i63 %mul, %arg2
	ret i63 %mad			ret i63 %mad
	}			}
	Show All 29 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_sextops_i31_i63:			; GFX11-LABEL: mad_i64_i32_sextops_i31_i63:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_bfe_i32 v4, v1, 0, 31			; GFX11-NEXT: v_bfe_i32 v4, v1, 0, 31
	; GFX11-NEXT: v_bfe_i32 v5, v0, 0, 31			; GFX11-NEXT: v_bfe_i32 v5, v0, 0, 31
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v5, v4, v[2:3]			; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v5, v4, v[2:3]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = sext i31 %arg0 to i63			%sext0 = sext i31 %arg0 to i63
	%sext1 = sext i31 %arg1 to i63			%sext1 = sext i31 %arg1 to i63
	%mul = mul i63 %sext0, %sext1			%mul = mul i63 %sext0, %sext1
	%mad = add i63 %mul, %arg2			%mad = add i63 %mul, %arg2
	ret i63 %mad			ret i63 %mad
	}			}
	Show All 32 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_extops_i32_i64:			; GFX11-LABEL: mad_i64_i32_extops_i32_i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mov_b32_e32 v5, v0			; GFX11-NEXT: v_mov_b32_e32 v5, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v5, v4, v[2:3]			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v5, v4, v[2:3]
	; GFX11-NEXT: v_ashrrev_i32_e32 v5, 31, v5			; GFX11-NEXT: v_ashrrev_i32_e32 v5, 31, v5
	; GFX11-NEXT: v_mov_b32_e32 v3, v1			; GFX11-NEXT: v_mov_b32_e32 v3, v1
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_u64_u32 v[1:2], null, v5, v4, v[3:4]			; GFX11-NEXT: v_mad_u64_u32 v[1:2], null, v5, v4, v[3:4]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%ext0 = sext i32 %arg0 to i64			%ext0 = sext i32 %arg0 to i64
	%ext1 = zext i32 %arg1 to i64			%ext1 = zext i32 %arg1 to i64
	%mul = mul i64 %ext0, %ext1			%mul = mul i64 %ext0, %ext1
	%mad = add i64 %mul, %arg2			%mad = add i64 %mul, %arg2
	ret i64 %mad			ret i64 %mad
	}			}
	Show All 20 Lines
	; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v0, v2, v[4:5]			; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v0, v2, v[4:5]
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_u64_u32_bitops:			; GFX11-LABEL: mad_u64_u32_bitops:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v3, v0			; GFX11-NEXT: v_mov_b32_e32 v3, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v3, v2, v[4:5]			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v3, v2, v[4:5]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%trunc.lhs = and i64 %arg0, 4294967295			%trunc.lhs = and i64 %arg0, 4294967295
	%trunc.rhs = and i64 %arg1, 4294967295			%trunc.rhs = and i64 %arg1, 4294967295
	%mul = mul i64 %trunc.lhs, %trunc.rhs			%mul = mul i64 %trunc.lhs, %trunc.rhs
	%add = add i64 %mul, %arg2			%add = add i64 %mul, %arg2
	ret i64 %add			ret i64 %add
	}			}
	Show All 32 Lines
	;			;
	; GFX11-LABEL: mad_u64_u32_bitops_lhs_mask_small:			; GFX11-LABEL: mad_u64_u32_bitops_lhs_mask_small:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v3, v2			; GFX11-NEXT: v_mov_b32_e32 v3, v2
	; GFX11-NEXT: v_mov_b32_e32 v2, v0			; GFX11-NEXT: v_mov_b32_e32 v2, v0
	; GFX11-NEXT: v_mov_b32_e32 v6, v1			; GFX11-NEXT: v_mov_b32_e32 v6, v1
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v2, v3, v[4:5]			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v2, v3, v[4:5]
	; GFX11-NEXT: v_and_b32_e32 v5, 1, v6			; GFX11-NEXT: v_and_b32_e32 v5, 1, v6
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-NEXT: v_mov_b32_e32 v4, v1			; GFX11-NEXT: v_mov_b32_e32 v4, v1
	; GFX11-NEXT: v_mad_u64_u32 v[1:2], null, v5, v3, v[4:5]			; GFX11-NEXT: v_mad_u64_u32 v[1:2], null, v5, v3, v[4:5]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%trunc.lhs = and i64 %arg0, 8589934591			%trunc.lhs = and i64 %arg0, 8589934591
	%trunc.rhs = and i64 %arg1, 4294967295			%trunc.rhs = and i64 %arg1, 4294967295
	%mul = mul i64 %trunc.lhs, %trunc.rhs			%mul = mul i64 %trunc.lhs, %trunc.rhs
	%add = add i64 %mul, %arg2			%add = add i64 %mul, %arg2
	ret i64 %add			ret i64 %add
	Show All 33 Lines
	; GFX9-NEXT: v_mov_b32_e32 v1, v2			; GFX9-NEXT: v_mov_b32_e32 v1, v2
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_u64_u32_bitops_rhs_mask_small:			; GFX11-LABEL: mad_u64_u32_bitops_rhs_mask_small:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v6, v0			; GFX11-NEXT: v_mov_b32_e32 v6, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v6, v2, v[4:5]			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v6, v2, v[4:5]
	; GFX11-NEXT: v_and_b32_e32 v4, 1, v3			; GFX11-NEXT: v_and_b32_e32 v4, 1, v3
	; GFX11-NEXT: v_mov_b32_e32 v3, v1			; GFX11-NEXT: v_mov_b32_e32 v3, v1
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_u64_u32 v[1:2], null, v6, v4, v[3:4]			; GFX11-NEXT: v_mad_u64_u32 v[1:2], null, v6, v4, v[3:4]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%trunc.lhs = and i64 %arg0, 4294967295			%trunc.lhs = and i64 %arg0, 4294967295
	%trunc.rhs = and i64 %arg1, 8589934591			%trunc.rhs = and i64 %arg1, 8589934591
	%mul = mul i64 %trunc.lhs, %trunc.rhs			%mul = mul i64 %trunc.lhs, %trunc.rhs
	%add = add i64 %mul, %arg2			%add = add i64 %mul, %arg2
	ret i64 %add			ret i64 %add
	}			}
	Show All 20 Lines
	; GFX9-NEXT: v_mad_i64_i32 v[0:1], s[4:5], v0, v2, v[4:5]			; GFX9-NEXT: v_mad_i64_i32 v[0:1], s[4:5], v0, v2, v[4:5]
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_bitops:			; GFX11-LABEL: mad_i64_i32_bitops:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v3, v0			; GFX11-NEXT: v_mov_b32_e32 v3, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v3, v2, v[4:5]			; GFX11-NEXT: v_mad_i64_i32 v[0:1], null, v3, v2, v[4:5]
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%shl.lhs = shl i64 %arg0, 32			%shl.lhs = shl i64 %arg0, 32
	%trunc.lhs = ashr i64 %shl.lhs, 32			%trunc.lhs = ashr i64 %shl.lhs, 32
	%shl.rhs = shl i64 %arg1, 32			%shl.rhs = shl i64 %arg1, 32
	%trunc.rhs = ashr i64 %shl.rhs, 32			%trunc.rhs = ashr i64 %shl.rhs, 32
	%mul = mul i64 %trunc.lhs, %trunc.rhs			%mul = mul i64 %trunc.lhs, %trunc.rhs
	%add = add i64 %mul, %arg2			%add = add i64 %mul, %arg2
	Show All 23 Lines
	; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v1, v0, v[0:1]			; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[4:5], v1, v0, v[0:1]
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_unpack_i64ops:			; GFX11-LABEL: mad_i64_i32_unpack_i64ops:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mad_u64_u32 v[2:3], null, v1, v0, v[0:1]			; GFX11-NEXT: v_mad_u64_u32 v[2:3], null, v1, v0, v[0:1]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_mov_b32_e32 v0, v2			; GFX11-NEXT: v_mov_b32_e32 v0, v2
	; GFX11-NEXT: v_mov_b32_e32 v1, v3			; GFX11-NEXT: v_mov_b32_e32 v1, v3
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%tmp4 = lshr i64 %arg0, 32			%tmp4 = lshr i64 %arg0, 32
	%tmp5 = and i64 %arg0, 4294967295			%tmp5 = and i64 %arg0, 4294967295
	%mul = mul nuw i64 %tmp4, %tmp5			%mul = mul nuw i64 %tmp4, %tmp5
	%mad = add i64 %mul, %arg0			%mad = add i64 %mul, %arg0
	ret i64 %mad			ret i64 %mad
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_twice:			; GFX11-LABEL: mad_i64_i32_twice:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mad_i64_i32 v[6:7], null, v0, v1, v[2:3]			; GFX11-NEXT: v_mad_i64_i32 v[6:7], null, v0, v1, v[2:3]
	; GFX11-NEXT: v_mad_i64_i32 v[2:3], null, v0, v1, v[4:5]			; GFX11-NEXT: v_mad_i64_i32 v[2:3], null, v0, v1, v[4:5]
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_xor_b32_e32 v0, v6, v2			; GFX11-NEXT: v_xor_b32_e32 v0, v6, v2
	; GFX11-NEXT: v_xor_b32_e32 v1, v7, v3			; GFX11-NEXT: v_xor_b32_e32 v1, v7, v3
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = sext i32 %arg0 to i64			%sext0 = sext i32 %arg0 to i64
	%sext1 = sext i32 %arg1 to i64			%sext1 = sext i32 %arg1 to i64
	%mul = mul i64 %sext0, %sext1			%mul = mul i64 %sext0, %sext1
	%mad1 = add i64 %mul, %arg2			%mad1 = add i64 %mul, %arg2
	%mad2 = add i64 %mul, %arg3			%mad2 = add i64 %mul, %arg3
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_xor_b32_e32 v0, v2, v0			; GFX9-NEXT: v_xor_b32_e32 v0, v2, v0
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_thrice:			; GFX11-LABEL: mad_i64_i32_thrice:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mad_i64_i32 v[8:9], null, v0, v1, 0			; GFX11-NEXT: v_mad_i64_i32 v[8:9], null, v0, v1, 0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v8, v2			; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v8, v2
	; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v9, v3, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v9, v3, vcc_lo
	; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v8, v4			; GFX11-NEXT: v_add_co_u32 v2, vcc_lo, v8, v4
	; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v9, v5, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v3, vcc_lo, v9, v5, vcc_lo
	; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v8, v6			; GFX11-NEXT: v_add_co_u32 v4, vcc_lo, v8, v6
	; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, v9, v7, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v5, vcc_lo, v9, v7, vcc_lo
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_4) \| instskip(NEXT) \| instid1(VALU_DEP_4)
	; GFX11-NEXT: v_xor_b32_e32 v0, v0, v2			; GFX11-NEXT: v_xor_b32_e32 v0, v0, v2
	; GFX11-NEXT: v_xor_b32_e32 v1, v1, v3			; GFX11-NEXT: v_xor_b32_e32 v1, v1, v3
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_xor_b32_e32 v0, v0, v4			; GFX11-NEXT: v_xor_b32_e32 v0, v0, v4
	; GFX11-NEXT: v_xor_b32_e32 v1, v1, v5			; GFX11-NEXT: v_xor_b32_e32 v1, v1, v5
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = sext i32 %arg0 to i64			%sext0 = sext i32 %arg0 to i64
	%sext1 = sext i32 %arg1 to i64			%sext1 = sext i32 %arg1 to i64
	%mul = mul i64 %sext0, %sext1			%mul = mul i64 %sext0, %sext1
	%mad1 = add i64 %mul, %arg2			%mad1 = add i64 %mul, %arg2
	%mad2 = add i64 %mul, %arg3			%mad2 = add i64 %mul, %arg3
	Show All 34 Lines
	; GFX9-NEXT: v_xor_b32_e32 v0, v0, v4			; GFX9-NEXT: v_xor_b32_e32 v0, v0, v4
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i64_i32_secondary_use:			; GFX11-LABEL: mad_i64_i32_secondary_use:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mad_i64_i32 v[4:5], null, v0, v1, 0			; GFX11-NEXT: v_mad_i64_i32 v[4:5], null, v0, v1, 0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v4, v2			; GFX11-NEXT: v_add_co_u32 v0, vcc_lo, v4, v2
	; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v5, v3, vcc_lo			; GFX11-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, v5, v3, vcc_lo
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
	; GFX11-NEXT: v_xor_b32_e32 v0, v0, v4			; GFX11-NEXT: v_xor_b32_e32 v0, v0, v4
	; GFX11-NEXT: v_xor_b32_e32 v1, v1, v5			; GFX11-NEXT: v_xor_b32_e32 v1, v1, v5
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%sext0 = sext i32 %arg0 to i64			%sext0 = sext i32 %arg0 to i64
	%sext1 = sext i32 %arg1 to i64			%sext1 = sext i32 %arg1 to i64
	%mul = mul i64 %sext0, %sext1			%mul = mul i64 %sext0, %sext1
	%mad = add i64 %mul, %arg2			%mad = add i64 %mul, %arg2
	%out = xor i64 %mad, %mul			%out = xor i64 %mad, %mul
	Show All 38 Lines
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: mad_i48_i48:			; GFX11-LABEL: mad_i48_i48:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_mov_b32_e32 v6, v1			; GFX11-NEXT: v_mov_b32_e32 v6, v1
	; GFX11-NEXT: v_mov_b32_e32 v7, v0			; GFX11-NEXT: v_mov_b32_e32 v7, v0
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(SKIP_1) \| instid1(VALU_DEP_4)
	; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v7, v2, v[4:5]			; GFX11-NEXT: v_mad_u64_u32 v[0:1], null, v7, v2, v[4:5]
	; GFX11-NEXT: v_mul_lo_u32 v3, v7, v3			; GFX11-NEXT: v_mul_lo_u32 v3, v7, v3
	; GFX11-NEXT: v_mul_lo_u32 v2, v6, v2			; GFX11-NEXT: v_mul_lo_u32 v2, v6, v2
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_add3_u32 v1, v2, v1, v3			; GFX11-NEXT: v_add3_u32 v1, v2, v1, v3
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%m = mul i48 %arg0, %arg1			%m = mul i48 %arg0, %arg1
	%a = add i48 %m, %arg2			%a = add i48 %m, %arg2
	ret i48 %a			ret i48 %a
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone speculatable }			attributes #1 = { nounwind readnone speculatable }

llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX9 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1030 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1030 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX11 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX11 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=+wavefrontsize64 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=+wavefrontsize64 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+wavefrontsize64 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX11 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -mattr=+wavefrontsize64 --verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX11 %s

	define amdgpu_ps float @mad_i32_vvv(i32 %a, i32 %b, i32 %c) {			define amdgpu_ps float @mad_i32_vvv(i32 %a, i32 %b, i32 %c) {
	; GFX9-LABEL: mad_i32_vvv:			; GFX9-LABEL: mad_i32_vvv:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[0:1], v0, v1, v[2:3]			; GFX9-NEXT: v_mad_u64_u32 v[0:1], s[0:1], v0, v1, v[2:3]
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: mad_i32_vvv:			; GFX10-LABEL: mad_i32_vvv:
	▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll

	Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines
	; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2			; GFX940-TGSPLIT-NEXT: flat_store_dword v[0:1], v2
	; GFX940-TGSPLIT-NEXT: s_endpgm			; GFX940-TGSPLIT-NEXT: s_endpgm
	;			;
	; GFX11-WGP-LABEL: flat_nontemporal_load_1:			; GFX11-WGP-LABEL: flat_nontemporal_load_1:
	; GFX11-WGP: ; %bb.0: ; %entry			; GFX11-WGP: ; %bb.0: ; %entry
	; GFX11-WGP-NEXT: s_load_b128 s[0:3], s[0:1], 0x0			; GFX11-WGP-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
	; GFX11-WGP-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-WGP-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX11-WGP-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-WGP-NEXT: v_add_co_u32 v0, s0, s0, v0			; GFX11-WGP-NEXT: v_add_co_u32 v0, s0, s0, v0
	; GFX11-WGP-NEXT: v_add_co_ci_u32_e64 v1, null, s1, 0, s0			; GFX11-WGP-NEXT: v_add_co_ci_u32_e64 v1, null, s1, 0, s0
	; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1] slc dlc			; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1] slc dlc
	; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2			; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
	; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3			; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
	; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2			; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
	; GFX11-WGP-NEXT: s_endpgm			; GFX11-WGP-NEXT: s_endpgm
	;			;
	; GFX11-CU-LABEL: flat_nontemporal_load_1:			; GFX11-CU-LABEL: flat_nontemporal_load_1:
	; GFX11-CU: ; %bb.0: ; %entry			; GFX11-CU: ; %bb.0: ; %entry
	; GFX11-CU-NEXT: s_load_b128 s[0:3], s[0:1], 0x0			; GFX11-CU-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
	; GFX11-CU-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-CU-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX11-CU-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-CU-NEXT: v_add_co_u32 v0, s0, s0, v0			; GFX11-CU-NEXT: v_add_co_u32 v0, s0, s0, v0
	; GFX11-CU-NEXT: v_add_co_ci_u32_e64 v1, null, s1, 0, s0			; GFX11-CU-NEXT: v_add_co_ci_u32_e64 v1, null, s1, 0, s0
	; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1] slc dlc			; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1] slc dlc
	; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2			; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
	; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3			; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
	; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2			; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
	; GFX11-CU-NEXT: s_endpgm			; GFX11-CU-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
	;			;
	; GFX11-WGP-LABEL: flat_nontemporal_store_1:			; GFX11-WGP-LABEL: flat_nontemporal_store_1:
	; GFX11-WGP: ; %bb.0: ; %entry			; GFX11-WGP: ; %bb.0: ; %entry
	; GFX11-WGP-NEXT: s_load_b128 s[0:3], s[0:1], 0x0			; GFX11-WGP-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
	; GFX11-WGP-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-WGP-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0			; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
	; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1			; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
				; GFX11-WGP-NEXT: s_delay_alu instid0(VALU_DEP_3)
	; GFX11-WGP-NEXT: v_add_co_u32 v0, s0, s2, v0			; GFX11-WGP-NEXT: v_add_co_u32 v0, s0, s2, v0
	; GFX11-WGP-NEXT: flat_load_b32 v2, v[1:2]			; GFX11-WGP-NEXT: flat_load_b32 v2, v[1:2]
	; GFX11-WGP-NEXT: v_add_co_ci_u32_e64 v1, null, s3, 0, s0			; GFX11-WGP-NEXT: v_add_co_ci_u32_e64 v1, null, s3, 0, s0
	; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2 glc slc dlc			; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2 glc slc dlc
	; GFX11-WGP-NEXT: s_endpgm			; GFX11-WGP-NEXT: s_endpgm
	;			;
	; GFX11-CU-LABEL: flat_nontemporal_store_1:			; GFX11-CU-LABEL: flat_nontemporal_store_1:
	; GFX11-CU: ; %bb.0: ; %entry			; GFX11-CU: ; %bb.0: ; %entry
	; GFX11-CU-NEXT: s_load_b128 s[0:3], s[0:1], 0x0			; GFX11-CU-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
	; GFX11-CU-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-CU-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0			; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
	; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1			; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
				; GFX11-CU-NEXT: s_delay_alu instid0(VALU_DEP_3)
	; GFX11-CU-NEXT: v_add_co_u32 v0, s0, s2, v0			; GFX11-CU-NEXT: v_add_co_u32 v0, s0, s2, v0
	; GFX11-CU-NEXT: flat_load_b32 v2, v[1:2]			; GFX11-CU-NEXT: flat_load_b32 v2, v[1:2]
	; GFX11-CU-NEXT: v_add_co_ci_u32_e64 v1, null, s3, 0, s0			; GFX11-CU-NEXT: v_add_co_ci_u32_e64 v1, null, s3, 0, s0
	; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2 glc slc dlc			; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2 glc slc dlc
	; GFX11-CU-NEXT: s_endpgm			; GFX11-CU-NEXT: s_endpgm
	i32* %in, i32* %out) {			i32* %in, i32* %out) {
	entry:			entry:
	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2			; SKIP-CACHE-INV-NEXT: flat_store_dword v[0:1], v2
	; SKIP-CACHE-INV-NEXT: s_endpgm			; SKIP-CACHE-INV-NEXT: s_endpgm
	;			;
	; GFX11-WGP-LABEL: flat_nontemporal_load_1:			; GFX11-WGP-LABEL: flat_nontemporal_load_1:
	; GFX11-WGP: ; %bb.0: ; %entry			; GFX11-WGP: ; %bb.0: ; %entry
	; GFX11-WGP-NEXT: s_load_b128 s[0:3], s[0:1], 0x0			; GFX11-WGP-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
	; GFX11-WGP-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-WGP-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
				; GFX11-WGP-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-WGP-NEXT: v_add_co_u32 v0, s0, s0, v0			; GFX11-WGP-NEXT: v_add_co_u32 v0, s0, s0, v0
	; GFX11-WGP-NEXT: v_add_co_ci_u32_e64 v1, null, s1, 0, s0			; GFX11-WGP-NEXT: v_add_co_ci_u32_e64 v1, null, s1, 0, s0
	; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1] glc dlc			; GFX11-WGP-NEXT: flat_load_b32 v2, v[0:1] glc dlc
	; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt vmcnt(0)
	; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2			; GFX11-WGP-NEXT: v_mov_b32_e32 v0, s2
	; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3			; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s3
	; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2			; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2
	; GFX11-WGP-NEXT: s_endpgm			; GFX11-WGP-NEXT: s_endpgm
	;			;
	; GFX11-CU-LABEL: flat_nontemporal_load_1:			; GFX11-CU-LABEL: flat_nontemporal_load_1:
	; GFX11-CU: ; %bb.0: ; %entry			; GFX11-CU: ; %bb.0: ; %entry
	; GFX11-CU-NEXT: s_load_b128 s[0:3], s[0:1], 0x0			; GFX11-CU-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
	; GFX11-CU-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-CU-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
				; GFX11-CU-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
	; GFX11-CU-NEXT: v_add_co_u32 v0, s0, s0, v0			; GFX11-CU-NEXT: v_add_co_u32 v0, s0, s0, v0
	; GFX11-CU-NEXT: v_add_co_ci_u32_e64 v1, null, s1, 0, s0			; GFX11-CU-NEXT: v_add_co_ci_u32_e64 v1, null, s1, 0, s0
	; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1] glc dlc			; GFX11-CU-NEXT: flat_load_b32 v2, v[0:1] glc dlc
	; GFX11-CU-NEXT: s_waitcnt vmcnt(0)			; GFX11-CU-NEXT: s_waitcnt vmcnt(0)
	; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2			; GFX11-CU-NEXT: v_mov_b32_e32 v0, s2
	; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3			; GFX11-CU-NEXT: v_mov_b32_e32 v1, s3
	; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2			; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2
	▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	;			;
	; GFX11-WGP-LABEL: flat_nontemporal_store_1:			; GFX11-WGP-LABEL: flat_nontemporal_store_1:
	; GFX11-WGP: ; %bb.0: ; %entry			; GFX11-WGP: ; %bb.0: ; %entry
	; GFX11-WGP-NEXT: s_load_b128 s[0:3], s[0:1], 0x0			; GFX11-WGP-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
	; GFX11-WGP-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-WGP-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0			; GFX11-WGP-NEXT: v_mov_b32_e32 v1, s0
	; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1			; GFX11-WGP-NEXT: v_mov_b32_e32 v2, s1
				; GFX11-WGP-NEXT: s_delay_alu instid0(VALU_DEP_3)
	; GFX11-WGP-NEXT: v_add_co_u32 v0, s0, s2, v0			; GFX11-WGP-NEXT: v_add_co_u32 v0, s0, s2, v0
	; GFX11-WGP-NEXT: flat_load_b32 v2, v[1:2]			; GFX11-WGP-NEXT: flat_load_b32 v2, v[1:2]
	; GFX11-WGP-NEXT: v_add_co_ci_u32_e64 v1, null, s3, 0, s0			; GFX11-WGP-NEXT: v_add_co_ci_u32_e64 v1, null, s3, 0, s0
	; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-WGP-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2 dlc			; GFX11-WGP-NEXT: flat_store_b32 v[0:1], v2 dlc
	; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-WGP-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-WGP-NEXT: s_endpgm			; GFX11-WGP-NEXT: s_endpgm
	;			;
	; GFX11-CU-LABEL: flat_nontemporal_store_1:			; GFX11-CU-LABEL: flat_nontemporal_store_1:
	; GFX11-CU: ; %bb.0: ; %entry			; GFX11-CU: ; %bb.0: ; %entry
	; GFX11-CU-NEXT: s_load_b128 s[0:3], s[0:1], 0x0			; GFX11-CU-NEXT: s_load_b128 s[0:3], s[0:1], 0x0
	; GFX11-CU-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX11-CU-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-CU-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0			; GFX11-CU-NEXT: v_mov_b32_e32 v1, s0
	; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1			; GFX11-CU-NEXT: v_mov_b32_e32 v2, s1
				; GFX11-CU-NEXT: s_delay_alu instid0(VALU_DEP_3)
	; GFX11-CU-NEXT: v_add_co_u32 v0, s0, s2, v0			; GFX11-CU-NEXT: v_add_co_u32 v0, s0, s2, v0
	; GFX11-CU-NEXT: flat_load_b32 v2, v[1:2]			; GFX11-CU-NEXT: flat_load_b32 v2, v[1:2]
	; GFX11-CU-NEXT: v_add_co_ci_u32_e64 v1, null, s3, 0, s0			; GFX11-CU-NEXT: v_add_co_ci_u32_e64 v1, null, s3, 0, s0
	; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-CU-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2 dlc			; GFX11-CU-NEXT: flat_store_b32 v[0:1], v2 dlc
	; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-CU-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-CU-NEXT: s_endpgm			; GFX11-CU-NEXT: s_endpgm
	i32* %in, i32* %out) {			i32* %in, i32* %out) {
	▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] New AMDGPUInsertDelayAlu passClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 441061

llvm/lib/Target/AMDGPU/AMDGPU.h

llvm/lib/Target/AMDGPU/AMDGPUInsertDelayAlu.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.gather4.dim.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.load.1d.d16.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.sample.g16.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.interp.inreg.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.intersect_ray.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

llvm/test/CodeGen/AMDGPU/cluster_stores.ll

llvm/test/CodeGen/AMDGPU/dual-source-blend-export.ll

llvm/test/CodeGen/AMDGPU/flat-scratch-svs.ll

llvm/test/CodeGen/AMDGPU/flat-scratch.ll

llvm/test/CodeGen/AMDGPU/insert-delay-alu.mir

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.exp.row.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.gather4.a16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.a16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.dim.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.encode.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.g16.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.interp.inreg.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.intersect_ray.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane64.ll

llvm/test/CodeGen/AMDGPU/llvm.mulo.ll

llvm/test/CodeGen/AMDGPU/mad_64_32.ll

llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-nontemporal.ll

llvm/test/CodeGen/AMDGPU/memory-legalizer-flat-volatile.ll

[AMDGPU] New AMDGPUInsertDelayAlu pass
ClosedPublic