This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Dynamically clear renamable to avoid constant bus errors
AbandonedPublic

Authored by critson on Sep 13 2020, 12:27 AM.

Download Raw Diff

Details

Reviewers

rampitec
foad
arsenm

Summary

Replace static disabling of renaming of instructions arguments
(to avoid overloading the constant bus) with dynamic clearing of
renamable flags. This allows Machine Copy Propagation to remove
more copies, in particular on GFX10 where the constant bus is
wider.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

critson created this revision.Sep 13 2020, 12:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2020, 12:27 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 9 others. · View Herald Transcript

critson requested review of this revision.Sep 13 2020, 12:27 AM

Herald added a subscriber: wdng. · View Herald TranscriptSep 13 2020, 12:27 AM

Harbormaster completed remote builds in B71499: Diff 291445.Sep 13 2020, 1:01 AM

Do you have any stats on how many more copies get propagated, and how many instructions that actually saves in the final ISA? I have a feeling that SIFoldOperands currently catches a lot of the cases that MachineCopyProp misses.

Where/why is renameable set in the first place? Can we just avoid setting it to begin with?

In D87585#2271185, @arsenm wrote:

Where/why is renameable set in the first place? Can we just avoid setting it to begin with?

Looks like it was originally done to avoid passes like MachineCopyPropagation from introducing constant bus violations: rG1d531013876c02b18df678a5f67d6a7d94e392b9

In D87585#2271188, @foad wrote:

In D87585#2271185, @arsenm wrote:

Where/why is renameable set in the first place? Can we just avoid setting it to begin with?

Looks like it was originally done to avoid passes like MachineCopyPropagation from introducing constant bus violations: rG1d531013876c02b18df678a5f67d6a7d94e392b9

The actual flags are set by Virtual Register Rewriter, true for most things.
The patch Jay referenced causes the renamable flags to be statically disabled for specific opcodes (i.e. all operands).

rampitec added inline comments.Sep 14 2020, 10:29 AM

llvm/lib/Target/AMDGPU/SIFixRenamableFlags.cpp
96	Same as RI.isVectorRegister().
98	Looks like you can get the same result with a Set/SetVector.
130	RI.isVectorRegister().
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
3599	This does not seem to be correct? I.e. s[0:1] and s1 overlap, but count as two constant bus operands. I think even s[0:1] and s0 are separate operands. I see that it is copied from below, but seems to be an error anyway?

The numbers for this change are not vastly compelling.
I looked at 11598 game shaders and compiled these for GFX7, GFX9 and GFX10.
On GFX7, 1 shader lost 1 instruction.
On GFX9, 1 shader lost 1 instruction, but 64 shaders gained 1 instruction.
On GFX10, 1 shader lost 1 instruction, but 2 shaders gained 1 instruction.

I started this change as I am looking at moving WQM after MI scheduling, and this potentially leaves some additional copies around.
But I could addressed these with very limited special case copy elimination in the WQM pass itself.

Does anyone have an opinion on whether I should continue pushing this?

In D87585#2274280, @critson wrote:

The numbers for this change are not vastly compelling.
I looked at 11598 game shaders and compiled these for GFX7, GFX9 and GFX10.
On GFX7, 1 shader lost 1 instruction.
On GFX9, 1 shader lost 1 instruction, but 64 shaders gained 1 instruction.
On GFX10, 1 shader lost 1 instruction, but 2 shaders gained 1 instruction.

I started this change as I am looking at moving WQM after MI scheduling, and this potentially leaves some additional copies around.
But I could addressed these with very limited special case copy elimination in the WQM pass itself.

Does anyone have an opinion on whether I should continue pushing this?

Honestly it does not sound like a good justification for a new pass.

Follow up verifier fix for issue noted by Stas in D87748.

In D87585#2274305, @rampitec wrote:

In D87585#2274280, @critson wrote:

The numbers for this change are not vastly compelling.
I looked at 11598 game shaders and compiled these for GFX7, GFX9 and GFX10.
On GFX7, 1 shader lost 1 instruction.
On GFX9, 1 shader lost 1 instruction, but 64 shaders gained 1 instruction.
On GFX10, 1 shader lost 1 instruction, but 2 shaders gained 1 instruction.

Are there any other significant changes? I'm thinking of things like this that would help with dependency stalls on gfx10:

v_mov v0, 0
v_mov v1, v0

v_mov v0, 0
v_mov v1, 0

I started this change as I am looking at moving WQM after MI scheduling, and this potentially leaves some additional copies around.
But I could addressed these with very limited special case copy elimination in the WQM pass itself.

Does anyone have an opinion on whether I should continue pushing this?

Honestly it does not sound like a good justification for a new pass.

Agreed, but maybe it's worth revisiting after the WQM pass has been moved.

In D87585#2276097, @foad wrote:
Are there any other significant changes? I'm thinking of things like this that would help with dependency stalls on gfx10:
v_mov v0, 0
v_mov v1, v0
->
v_mov v0, 0
v_mov v1, 0
I started this change as I am looking at moving WQM after MI scheduling, and this potentially leaves some additional copies around.
But I could addressed these with very limited special case copy elimination in the WQM pass itself.

Does anyone have an opinion on whether I should continue pushing this?

Honestly it does not sound like a good justification for a new pass.

Agreed, but maybe it's worth revisiting after the WQM pass has been moved.

I will review shader changes; however, I do not think Machine Copy Propagation propagates constants.

In D87585#2276099, @critson wrote:
In D87585#2276097, @foad wrote:
Are there any other significant changes? I'm thinking of things like this that would help with dependency stalls on gfx10:
v_mov v0, 0
v_mov v1, v0
->
v_mov v0, 0
v_mov v1, 0

I will review shader changes; however, I do not think Machine Copy Propagation propagates constants.

True. But this can also help with stalls:

v_mov v0, s0
v_mov v1, v0

v_mov v0, s0
v_mov v1, s0

In D87585#2276142, @foad wrote:
In D87585#2276099, @critson wrote:

I will review shader changes; however, I do not think Machine Copy Propagation propagates constants.

True. But this can also help with stalls:
v_mov v0, s0
v_mov v1, v0
->
v_mov v0, s0
v_mov v1, s0

I looked over all the GFX10 diffs and could not find anything useful happening except a handful of rewrites of v_cndmask to use an additional SGPR argument; however, in practice none these would affect stalls. This is not too surprising v_cndmask was the made target of this when I wrote it, but for the specific code generation use case (moving WQM) I will take care of these at code generation time instead.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.h

4 lines

AMDGPUTargetMachine.cpp

7 lines

CMakeLists.txt

1 line

SIFixRenamableFlags.cpp

140 lines

SIInstrFormats.td

3 lines

SIInstrInfo.h

5 lines

SIInstrInfo.cpp

92 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

ashr.ll

2 lines

extractelement-stack-lower.ll

8 lines

insertelement.ll

36 lines

llvm.amdgcn.image.store.2d.d16.ll

6 lines

saddsat.ll

14 lines

ssubsat.ll

14 lines

atomicrmw-nand.ll

6 lines

ds-combine-large-stride.ll

20 lines

indirect-addressing-term.ll

6 lines

llvm.amdgcn.ds.ordered.swap.ll

2 lines

machine-cp-cndmask.mir

49 lines

multilevel-break.ll

2 lines

regbank-reassign-wave64.mir

6 lines

regbank-reassign.mir

6 lines

ret.ll

4 lines

sgpr-spill-wrong-stack-id.mir

4 lines

stack-slot-color-sgpr-vgpr-spills.mir

2 lines

transform-block-with-return-to-epilog.ll

6 lines

Diff 291445

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	FunctionPass *createSIWholeQuadModePass();			FunctionPass *createSIWholeQuadModePass();
	FunctionPass *createSIFixControlFlowLiveIntervalsPass();			FunctionPass *createSIFixControlFlowLiveIntervalsPass();
	FunctionPass *createSIOptimizeExecMaskingPreRAPass();			FunctionPass *createSIOptimizeExecMaskingPreRAPass();
	FunctionPass *createSIFixSGPRCopiesPass();			FunctionPass *createSIFixSGPRCopiesPass();
	FunctionPass *createSIMemoryLegalizerPass();			FunctionPass *createSIMemoryLegalizerPass();
	FunctionPass *createSIInsertWaitcntsPass();			FunctionPass *createSIInsertWaitcntsPass();
	FunctionPass *createSIPreAllocateWWMRegsPass();			FunctionPass *createSIPreAllocateWWMRegsPass();
	FunctionPass *createSIFormMemoryClausesPass();			FunctionPass *createSIFormMemoryClausesPass();
				FunctionPass *createSIFixRenamableFlagsPass();

	FunctionPass *createSIPostRABundlerPass();			FunctionPass *createSIPostRABundlerPass();
	FunctionPass createAMDGPUSimplifyLibCallsPass(const TargetMachine );			FunctionPass createAMDGPUSimplifyLibCallsPass(const TargetMachine );
	FunctionPass *createAMDGPUUseNativeCallsPass();			FunctionPass *createAMDGPUUseNativeCallsPass();
	FunctionPass *createAMDGPUCodeGenPreparePass();			FunctionPass *createAMDGPUCodeGenPreparePass();
	FunctionPass *createAMDGPUMachineCFGStructurizerPass();			FunctionPass *createAMDGPUMachineCFGStructurizerPass();
	FunctionPass createAMDGPUPropagateAttributesEarlyPass(const TargetMachine );			FunctionPass createAMDGPUPropagateAttributesEarlyPass(const TargetMachine );
	ModulePass createAMDGPUPropagateAttributesLatePass(const TargetMachine );			ModulePass createAMDGPUPropagateAttributesLatePass(const TargetMachine );
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	extern char &SIInsertSkipsPassID;			extern char &SIInsertSkipsPassID;

	void initializeSIOptimizeExecMaskingPass(PassRegistry &);			void initializeSIOptimizeExecMaskingPass(PassRegistry &);
	extern char &SIOptimizeExecMaskingID;			extern char &SIOptimizeExecMaskingID;

	void initializeSIPreAllocateWWMRegsPass(PassRegistry &);			void initializeSIPreAllocateWWMRegsPass(PassRegistry &);
	extern char &SIPreAllocateWWMRegsID;			extern char &SIPreAllocateWWMRegsID;

				void initializeSIFixRenamableFlagsPass(PassRegistry &);
				extern char &SIFixRenamableFlagsID;

	void initializeAMDGPUSimplifyLibCallsPass(PassRegistry &);			void initializeAMDGPUSimplifyLibCallsPass(PassRegistry &);
	extern char &AMDGPUSimplifyLibCallsID;			extern char &AMDGPUSimplifyLibCallsID;

	void initializeAMDGPUUseNativeCallsPass(PassRegistry &);			void initializeAMDGPUUseNativeCallsPass(PassRegistry &);
	extern char &AMDGPUUseNativeCallsID;			extern char &AMDGPUUseNativeCallsID;

	void initializeSIAddIMGInitPass(PassRegistry &);			void initializeSIAddIMGInitPass(PassRegistry &);
	extern char &SIAddIMGInitID;			extern char &SIAddIMGInitID;
	▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

//===-- AMDGPUTargetMachine.cpp - TargetMachine for hw codegen targets-----===//		//===-- AMDGPUTargetMachine.cpp - TargetMachine for hw codegen targets-----===//
		Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeSILowerControlFlowPass(*PR);		initializeSILowerControlFlowPass(*PR);
initializeSIRemoveShortExecBranchesPass(*PR);		initializeSIRemoveShortExecBranchesPass(*PR);
initializeSIPreEmitPeepholePass(*PR);		initializeSIPreEmitPeepholePass(*PR);
initializeSIInsertSkipsPass(*PR);		initializeSIInsertSkipsPass(*PR);
initializeSIMemoryLegalizerPass(*PR);		initializeSIMemoryLegalizerPass(*PR);
initializeSIOptimizeExecMaskingPass(*PR);		initializeSIOptimizeExecMaskingPass(*PR);
initializeSIPreAllocateWWMRegsPass(*PR);		initializeSIPreAllocateWWMRegsPass(*PR);
initializeSIFormMemoryClausesPass(*PR);		initializeSIFormMemoryClausesPass(*PR);
		initializeSIFixRenamableFlagsPass(*PR);
initializeSIPostRABundlerPass(*PR);		initializeSIPostRABundlerPass(*PR);
initializeAMDGPUUnifyDivergentExitNodesPass(*PR);		initializeAMDGPUUnifyDivergentExitNodesPass(*PR);
initializeAMDGPUAAWrapperPassPass(*PR);		initializeAMDGPUAAWrapperPassPass(*PR);
initializeAMDGPUExternalAAWrapperPass(*PR);		initializeAMDGPUExternalAAWrapperPass(*PR);
initializeAMDGPUUseNativeCallsPass(*PR);		initializeAMDGPUUseNativeCallsPass(*PR);
initializeAMDGPUSimplifyLibCallsPass(*PR);		initializeAMDGPUSimplifyLibCallsPass(*PR);
initializeAMDGPUInlinerPass(*PR);		initializeAMDGPUInlinerPass(*PR);
initializeAMDGPUPrintfRuntimeBindingPass(*PR);		initializeAMDGPUPrintfRuntimeBindingPass(*PR);
▲ Show 20 Lines • Show All 402 Lines • ▼ Show 20 Lines	public:
bool addLegalizeMachineIR() override;		bool addLegalizeMachineIR() override;
void addPreRegBankSelect() override;		void addPreRegBankSelect() override;
bool addRegBankSelect() override;		bool addRegBankSelect() override;
bool addGlobalInstructionSelect() override;		bool addGlobalInstructionSelect() override;
void addFastRegAlloc() override;		void addFastRegAlloc() override;
void addOptimizedRegAlloc() override;		void addOptimizedRegAlloc() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
bool addPreRewrite() override;		bool addPreRewrite() override;
		void addPostRewrite() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreSched2() override;		void addPreSched2() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};

} // end anonymous namespace		} // end anonymous namespace

void AMDGPUPassConfig::addEarlyCSEOrGVNPass() {		void AMDGPUPassConfig::addEarlyCSEOrGVNPass() {
▲ Show 20 Lines • Show All 338 Lines • ▼ Show 20 Lines
bool GCNPassConfig::addPreRewrite() {		bool GCNPassConfig::addPreRewrite() {
if (EnableRegReassign) {		if (EnableRegReassign) {
addPass(&GCNNSAReassignID);		addPass(&GCNNSAReassignID);
addPass(&GCNRegBankReassignID);		addPass(&GCNRegBankReassignID);
}		}
return true;		return true;
}		}

		void GCNPassConfig::addPostRewrite() {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -void GCNPassConfig::addPostRewrite() { - addPass(&SIFixRenamableFlagsID); -} +void GCNPassConfig::addPostRewrite() { addPass(&SIFixRenamableFlagsID); } Lint: Pre-merge checks: clang-format: please reformat the code ``` -void GCNPassConfig::addPostRewrite() { - addPass…
		addPass(&SIFixRenamableFlagsID);
		}

void GCNPassConfig::addPostRegAlloc() {		void GCNPassConfig::addPostRegAlloc() {
addPass(&SIFixVGPRCopiesID);		addPass(&SIFixVGPRCopiesID);
if (getOptLevel() > CodeGenOpt::None)		if (getOptLevel() > CodeGenOpt::None)
addPass(&SIOptimizeExecMaskingID);		addPass(&SIOptimizeExecMaskingID);

TargetPassConfig::addPostRegAlloc();		TargetPassConfig::addPostRegAlloc();

// Equivalent of PEI for SGPRs.		// Equivalent of PEI for SGPRs.
addPass(&SILowerSGPRSpillsID);		addPass(&SILowerSGPRSpillsID);
}		}

void GCNPassConfig::addPreSched2() {		void GCNPassConfig::addPreSched2() {
addPass(&SIPostRABundlerID);		addPass(&SIPostRABundlerID);
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
R600MachineFunctionInfo.cpp		R600MachineFunctionInfo.cpp
R600MachineScheduler.cpp		R600MachineScheduler.cpp
R600OpenCLImageTypeLoweringPass.cpp		R600OpenCLImageTypeLoweringPass.cpp
R600OptimizeVectorRegisters.cpp		R600OptimizeVectorRegisters.cpp
R600Packetizer.cpp		R600Packetizer.cpp
R600RegisterInfo.cpp		R600RegisterInfo.cpp
SIAddIMGInit.cpp		SIAddIMGInit.cpp
SIAnnotateControlFlow.cpp		SIAnnotateControlFlow.cpp
		SIFixRenamableFlags.cpp
SIFixSGPRCopies.cpp		SIFixSGPRCopies.cpp
SIFixVGPRCopies.cpp		SIFixVGPRCopies.cpp
SIPreAllocateWWMRegs.cpp		SIPreAllocateWWMRegs.cpp
SIFoldOperands.cpp		SIFoldOperands.cpp
SIFormMemoryClauses.cpp		SIFormMemoryClauses.cpp
SIFrameLowering.cpp		SIFrameLowering.cpp
SIInsertHardClauses.cpp		SIInsertHardClauses.cpp
SIInsertSkips.cpp		SIInsertSkips.cpp
Show All 31 Lines

llvm/lib/Target/AMDGPU/SIFixRenamableFlags.cpp

This file was added.

				//===- SIFixRenamableFlags.cpp - Fix Renamable Flags Post-RA ------------===//
				Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// Pass to remove renamable flags which could cause Machine Copy Progation
				/// to generate constant bus violations or bank conflicts.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"
				#include "AMDGPUSubtarget.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
				#include "SIInstrInfo.h"
				#include "SIMachineFunctionInfo.h"
				#include "SIRegisterInfo.h"
				#include "llvm/CodeGen/LiveInterval.h"
				#include "llvm/CodeGen/LiveIntervals.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/InitializePasses.h"

				using namespace llvm;

				#define DEBUG_TYPE "si-fix-renamable-flags"

				namespace {

				class SIFixRenamableFlags : public MachineFunctionPass {
				public:
				static char ID;

				SIFixRenamableFlags() : MachineFunctionPass(ID) {
				initializeSIFixRenamableFlagsPass(*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addPreserved<LiveIntervals>();
				AU.addPreserved<SlotIndexes>();
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}
				};

				} // End anonymous namespace.

				INITIALIZE_PASS_BEGIN(SIFixRenamableFlags, DEBUG_TYPE,
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -INITIALIZE_PASS_BEGIN(SIFixRenamableFlags, DEBUG_TYPE, - "SI Fix Renamable Flags", false, false) -INITIALIZE_PASS_END(SIFixRenamableFlags, DEBUG_TYPE, - "SI Fix Renamable Flags", false, false) +INITIALIZE_PASS_BEGIN(SIFixRenamableFlags, DEBUG_TYPE, "SI Fix Renamable Flags", + false, false) +INITIALIZE_PASS_END(SIFixRenamableFlags, DEBUG_TYPE, "SI Fix Renamable Flags", + false, false) Lint: Pre-merge checks: clang-format: please reformat the code ``` -INITIALIZE_PASS_BEGIN(SIFixRenamableFlags…
				"SI Fix Renamable Flags", false, false)
				INITIALIZE_PASS_END(SIFixRenamableFlags, DEBUG_TYPE,
				"SI Fix Renamable Flags", false, false)

				char SIFixRenamableFlags::ID = 0;

				char &llvm::SIFixRenamableFlagsID = SIFixRenamableFlags::ID;

				FunctionPass *llvm::createSIFixRenamableFlagsPass() {
				return new SIFixRenamableFlags();
				}

				bool SIFixRenamableFlags::runOnMachineFunction(MachineFunction &MF) {
				if (skipFunction(MF.getFunction()))
				return false;

				LLVM_DEBUG(dbgs() << "SIFixRenamableFlags: function " << MF.getName()
				<< "\n");

				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				const bool HasRegisterBanking = ST.hasRegisterBanking();

				const SIInstrInfo *TII = ST.getInstrInfo();
				MachineRegisterInfo *MRI = &MF.getRegInfo();
				const SIRegisterInfo &RI = TII->getRegisterInfo();

				bool Changed = false;

				for (auto &MBB : MF) {
				for (auto &MI : MBB) {
				// Only check VALUs
				if (!(TII->isVOP1(MI) \|\| TII->isVOP2(MI) \|\| TII->isVOP3(MI) \|\|
				TII->isVOPC(MI) \|\| TII->isSDWA(MI) \|\| TII->isVALU(MI)))
				continue;

				std::pair<unsigned, unsigned> Usage = TII->constantBusUseCount(*MRI, MI);
				unsigned ConstantBusUses = Usage.first;

				// Count VGPR usage (treating AGPRs as VGPRs)
				SmallVector<Register, 4> VGPRs;
				for (auto &Use : MI.uses()) {
				if (Use.isReg()) {
				Register Reg = Use.getReg();
				if (RI.isVGPR(MRI, Reg) \|\| RI.isAGPR(MRI, Reg)) {
				rampitecUnsubmitted Not Done Reply Inline Actions Same as RI.isVectorRegister(). rampitec: Same as RI.isVectorRegister().
				if (llvm::all_of(VGPRs, [Reg](unsigned VGPR) {
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - if (llvm::all_of(VGPRs, [Reg](unsigned VGPR) { - return VGPR != Reg; - })) { + if (llvm::all_of(VGPRs, + [Reg](unsigned VGPR) { return VGPR != Reg; })) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - if (llvm::all_of(VGPRs, [Reg](unsigned…
				return VGPR != Reg;
				rampitecUnsubmitted Not Done Reply Inline Actions Looks like you can get the same result with a Set/SetVector. rampitec: Looks like you can get the same result with a Set/SetVector.
				})) {
				VGPRs.push_back(Reg);
				}
				}
				}
				}

				// Machine Copy Propagation can change a VGPR to SGPR and increase
				// constant bus usage.
				// If there is not enough constant bus capacity to support this
				// then we need to disable renaming.
				unsigned ConstantBusLimit = ST.getConstantBusLimit(MI.getOpcode());
				unsigned FreeBusCapacity = ConstantBusLimit - ConstantBusUses;

				// Check for free constant bus capacity to handle renaming.
				// If register banking then remove renaming for multiple VGPRs to avoid
				// conflicts.
				if ((VGPRs.size() <= FreeBusCapacity) &&
				(!HasRegisterBanking \|\| VGPRs.size() < 2))
				continue;

				LLVM_DEBUG(dbgs() << "Disable renaming for "
				<< "(" << FreeBusCapacity << ", " << VGPRs.size()
				<< "): " << MI);

				// Insufficient bus capacity to handle VGPR->SGPR renaming,
				// disable renaming for VGPRs in this instruction.

				for (auto &Use : MI.uses()) {
				if (Use.isReg() && Use.isRenamable()) {
				Register Reg = Use.getReg();
				if (RI.isVGPR(MRI, Reg) \|\| RI.isAGPR(MRI, Reg)) {
				rampitecUnsubmitted Not Done Reply Inline Actions RI.isVectorRegister(). rampitec: RI.isVectorRegister().
				Use.setIsRenamable(false);
				Changed = true;
				}
				}
				}
				}
				}

				return Changed;
				}

llvm/lib/Target/AMDGPU/SIInstrFormats.td

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	class InstSI <dag outs, dag ins, string asm = "",
let SchedRW = [Write32Bit];		let SchedRW = [Write32Bit];

field bits<1> DisableSIDecoder = 0;		field bits<1> DisableSIDecoder = 0;
field bits<1> DisableVIDecoder = 0;		field bits<1> DisableVIDecoder = 0;
field bits<1> DisableDecoder = 0;		field bits<1> DisableDecoder = 0;

let isAsmParserOnly = !if(!eq(DisableDecoder{0}, {0}), 0, 1);		let isAsmParserOnly = !if(!eq(DisableDecoder{0}, {0}), 0, 1);
let AsmVariantName = AMDGPUAsmVariants.Default;		let AsmVariantName = AMDGPUAsmVariants.Default;

// Avoid changing source registers in a way that violates constant bus read limitations.
let hasExtraSrcRegAllocReq = !if(VOP1,1,!if(VOP2,1,!if(VOP3,1,!if(VOPC,1,!if(SDWA,1, !if(VALU,1,0))))));
}		}

class PseudoInstSI<dag outs, dag ins, list<dag> pattern = [], string asm = "">		class PseudoInstSI<dag outs, dag ins, list<dag> pattern = [], string asm = "">
: InstSI<outs, ins, asm, pattern> {		: InstSI<outs, ins, asm, pattern> {
let isPseudo = 1;		let isPseudo = 1;
let isCodeGenOnly = 1;		let isCodeGenOnly = 1;
}		}

▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 780 Lines • ▼ Show 20 Lines	public:
/// This function will return false if you pass it a 32-bit instruction.		/// This function will return false if you pass it a 32-bit instruction.
bool hasVALU32BitEncoding(unsigned Opcode) const;		bool hasVALU32BitEncoding(unsigned Opcode) const;

/// Returns true if this operand uses the constant bus.		/// Returns true if this operand uses the constant bus.
bool usesConstantBus(const MachineRegisterInfo &MRI,		bool usesConstantBus(const MachineRegisterInfo &MRI,
const MachineOperand &MO,		const MachineOperand &MO,
const MCOperandInfo &OpInfo) const;		const MCOperandInfo &OpInfo) const;

		/// Return pair of bus use count and literal count for machine instruction.
		std::pair<unsigned, unsigned>
		constantBusUseCount(const MachineRegisterInfo &MRI,
		const MachineInstr &MI) const;

/// Return true if this instruction has any modifiers.		/// Return true if this instruction has any modifiers.
/// e.g. src[012]_mod, omod, clamp.		/// e.g. src[012]_mod, omod, clamp.
bool hasModifiers(unsigned Opcode) const;		bool hasModifiers(unsigned Opcode) const;

bool hasModifiersSet(const MachineInstr &MI,		bool hasModifiersSet(const MachineInstr &MI,
unsigned OpName) const;		unsigned OpName) const;
bool hasAnyModifiersSet(const MachineInstr &MI) const;		bool hasAnyModifiersSet(const MachineInstr &MI) const;

▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 3,557 Lines • ▼ Show 20 Lines	static bool isSubRegOf(const SIRegisterInfo &TRI,
const MachineOperand &SubReg) {		const MachineOperand &SubReg) {
if (SubReg.getReg().isPhysical())		if (SubReg.getReg().isPhysical())
return TRI.isSubRegister(SuperVec.getReg(), SubReg.getReg());		return TRI.isSubRegister(SuperVec.getReg(), SubReg.getReg());

return SubReg.getSubReg() != AMDGPU::NoSubRegister &&		return SubReg.getSubReg() != AMDGPU::NoSubRegister &&
SubReg.getReg() == SuperVec.getReg();		SubReg.getReg() == SuperVec.getReg();
}		}

		std::pair<unsigned, unsigned>
		SIInstrInfo::constantBusUseCount(const MachineRegisterInfo &MRI,
		const MachineInstr &MI) const {
		// Only look at the true operands. Only a real operand can use the constant
		// bus, and we don't want to check pseudo-operands like the source modifier
		// flags.
		const uint16_t Opcode = MI.getOpcode();
		const int Src0Idx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::src0);
		const int Src1Idx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::src1);
		const int Src2Idx = AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::src2);
		const int OpIndices[] = {Src0Idx, Src1Idx, Src2Idx};

		unsigned ConstantBusCount = 0;
		unsigned LiteralCount = 0;

		if (AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::imm) != -1)
		++ConstantBusCount;

		SmallVector<Register, 2> SGPRsUsed;
		Register SGPRUsed = findImplicitSGPRRead(MI);
		if (SGPRUsed != AMDGPU::NoRegister) {
		++ConstantBusCount;
		SGPRsUsed.push_back(SGPRUsed);
		}

		for (int OpIdx : OpIndices) {
		if (OpIdx == -1)
		break;
		const MachineOperand &MO = MI.getOperand(OpIdx);
		if (usesConstantBus(MRI, MO, MI.getDesc().OpInfo[OpIdx])) {
		if (MO.isReg()) {
		SGPRUsed = MO.getReg();
		if (llvm::all_of(SGPRsUsed, [this, SGPRUsed](unsigned SGPR) {
		return !RI.regsOverlap(SGPRUsed, SGPR);
		rampitecUnsubmitted Not Done Reply Inline Actions This does not seem to be correct? I.e. s[0:1] and s1 overlap, but count as two constant bus operands. I think even s[0:1] and s0 are separate operands. I see that it is copied from below, but seems to be an error anyway? rampitec: This does not seem to be correct? I.e. s[0:1] and s1 overlap, but count as two constant bus…
		})) {
		++ConstantBusCount;
		SGPRsUsed.push_back(SGPRUsed);
		}
		} else {
		++ConstantBusCount;
		++LiteralCount;
		}
		}
		}

		return std::pair<unsigned, unsigned>(ConstantBusCount, LiteralCount);
		}

bool SIInstrInfo::verifyInstruction(const MachineInstr &MI,		bool SIInstrInfo::verifyInstruction(const MachineInstr &MI,
StringRef &ErrInfo) const {		StringRef &ErrInfo) const {
uint16_t Opcode = MI.getOpcode();		uint16_t Opcode = MI.getOpcode();
if (SIInstrInfo::isGenericOpcode(MI.getOpcode()))		if (SIInstrInfo::isGenericOpcode(MI.getOpcode()))
return true;		return true;

const MachineFunction *MF = MI.getParent()->getParent();		const MachineFunction *MF = MI.getParent()->getParent();
const MachineRegisterInfo &MRI = MF->getRegInfo();		const MachineRegisterInfo &MRI = MF->getRegInfo();
▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	if (DMask) {
}		}
}		}
}		}
}		}

// Verify VOP*. Ignore multiple sgpr operands on writelane.		// Verify VOP*. Ignore multiple sgpr operands on writelane.
if (Desc.getOpcode() != AMDGPU::V_WRITELANE_B32		if (Desc.getOpcode() != AMDGPU::V_WRITELANE_B32
&& (isVOP1(MI) \|\| isVOP2(MI) \|\| isVOP3(MI) \|\| isVOPC(MI) \|\| isSDWA(MI))) {		&& (isVOP1(MI) \|\| isVOP2(MI) \|\| isVOP3(MI) \|\| isVOPC(MI) \|\| isSDWA(MI))) {
// Only look at the true operands. Only a real operand can use the constant		const std::pair<unsigned, unsigned> Usage = constantBusUseCount(MRI, MI);
// bus, and we don't want to check pseudo-operands like the source modifier		const unsigned ConstantBusCount = Usage.first;
// flags.		const unsigned LiteralCount = Usage.second;
const int OpIndices[] = { Src0Idx, Src1Idx, Src2Idx };		const GCNSubtarget &ST = MF->getSubtarget<GCNSubtarget>();

unsigned ConstantBusCount = 0;
unsigned LiteralCount = 0;

if (AMDGPU::getNamedOperandIdx(Opcode, AMDGPU::OpName::imm) != -1)
++ConstantBusCount;

SmallVector<Register, 2> SGPRsUsed;
Register SGPRUsed = findImplicitSGPRRead(MI);
if (SGPRUsed != AMDGPU::NoRegister) {
++ConstantBusCount;
SGPRsUsed.push_back(SGPRUsed);
}

for (int OpIdx : OpIndices) {
if (OpIdx == -1)
break;
const MachineOperand &MO = MI.getOperand(OpIdx);
if (usesConstantBus(MRI, MO, MI.getDesc().OpInfo[OpIdx])) {
if (MO.isReg()) {
SGPRUsed = MO.getReg();
if (llvm::all_of(SGPRsUsed, [this, SGPRUsed](unsigned SGPR) {
return !RI.regsOverlap(SGPRUsed, SGPR);
})) {
++ConstantBusCount;
SGPRsUsed.push_back(SGPRUsed);
}
} else {
++ConstantBusCount;
++LiteralCount;
}
}
}
// v_writelane_b32 is an exception from constant bus restriction:		// v_writelane_b32 is an exception from constant bus restriction:
// vsrc0 can be sgpr, const or m0 and lane select sgpr, m0 or inline-const		// vsrc0 can be sgpr, const or m0 and lane select sgpr, m0 or inline-const
if (ConstantBusCount > ST.getConstantBusLimit(Opcode) &&		if (ConstantBusCount > ST.getConstantBusLimit(Opcode)) {
Opcode != AMDGPU::V_WRITELANE_B32) {
ErrInfo = "VOP* instruction violates constant bus restriction";		ErrInfo = "VOP* instruction violates constant bus restriction";
return false;		return false;
}		}

if (isVOP3(MI) && LiteralCount) {		if (isVOP3(MI) && LiteralCount) {
if (!ST.hasVOP3Literal()) {		if (!ST.hasVOP3Literal()) {
ErrInfo = "VOP3 instruction uses literal";		ErrInfo = "VOP3 instruction uses literal";
return false;		return false;
▲ Show 20 Lines • Show All 3,417 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll

Show First 20 Lines • Show All 1,039 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
ret i64 %result		ret i64 %result
}		}

define i64 @v_ashr_i64_32(i64 %value) {		define i64 @v_ashr_i64_32(i64 %value) {
; GCN-LABEL: v_ashr_i64_32:		; GCN-LABEL: v_ashr_i64_32:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, v1		; GCN-NEXT: v_mov_b32_e32 v0, v1
; GCN-NEXT: v_ashrrev_i32_e32 v1, 31, v0		; GCN-NEXT: v_ashrrev_i32_e32 v1, 31, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
%result = ashr i64 %value, 32		%result = ashr i64 %value, 32
ret i64 %result		ret i64 %result
}		}

define i64 @v_ashr_i64_31(i64 %value) {		define i64 @v_ashr_i64_31(i64 %value) {
; GFX6-LABEL: v_ashr_i64_31:		; GFX6-LABEL: v_ashr_i64_31:
; GFX6: ; %bb.0:		; GFX6: ; %bb.0:
▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s		; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s

; Check lowering of some large extractelement that use the stack		; Check lowering of some large extractelement that use the stack
; instead of register indexing.		; instead of register indexing.

define i32 @v_extract_v64i32_varidx(<64 x i32> addrspace(1)* %ptr, i32 %idx) {		define i32 @v_extract_v64i32_varidx(<64 x i32> addrspace(1)* %ptr, i32 %idx) {
; GCN-LABEL: v_extract_v64i32_varidx:		; GCN-LABEL: v_extract_v64i32_varidx:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v15, v0
; GCN-NEXT: s_add_u32 s4, s32, 0x3fc0		; GCN-NEXT: s_add_u32 s4, s32, 0x3fc0
; GCN-NEXT: s_mov_b32 s5, 0		; GCN-NEXT: s_mov_b32 s5, 0
; GCN-NEXT: s_mov_b32 s6, s33		; GCN-NEXT: s_mov_b32 s6, s33
; GCN-NEXT: s_and_b32 s33, s4, 0xffffc000		; GCN-NEXT: s_and_b32 s33, s4, 0xffffc000
; GCN-NEXT: s_movk_i32 s4, 0x80		; GCN-NEXT: s_movk_i32 s4, 0x80
; GCN-NEXT: v_mov_b32_e32 v12, s5		; GCN-NEXT: v_mov_b32_e32 v12, s5
; GCN-NEXT: v_mov_b32_e32 v16, v1		; GCN-NEXT: v_mov_b32_e32 v16, v1
; GCN-NEXT: v_add_co_u32_e32 v31, vcc, 64, v15		; GCN-NEXT: v_add_co_u32_e32 v31, vcc, 64, v0
		; GCN-NEXT: v_mov_b32_e32 v15, v0
; GCN-NEXT: v_mov_b32_e32 v11, s4		; GCN-NEXT: v_mov_b32_e32 v11, s4
; GCN-NEXT: v_addc_co_u32_e32 v32, vcc, 0, v16, vcc		; GCN-NEXT: v_addc_co_u32_e32 v32, vcc, 0, v16, vcc
; GCN-NEXT: v_add_co_u32_e32 v48, vcc, v15, v11		; GCN-NEXT: v_add_co_u32_e32 v48, vcc, v15, v11
; GCN-NEXT: v_addc_co_u32_e32 v49, vcc, v16, v12, vcc		; GCN-NEXT: v_addc_co_u32_e32 v49, vcc, v16, v12, vcc
; GCN-NEXT: s_movk_i32 s4, 0xc0		; GCN-NEXT: s_movk_i32 s4, 0xc0
; GCN-NEXT: v_mov_b32_e32 v12, s5		; GCN-NEXT: v_mov_b32_e32 v12, s5
; GCN-NEXT: v_mov_b32_e32 v11, s4		; GCN-NEXT: v_mov_b32_e32 v11, s4
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
%elt = extractelement <64 x i32> %vec, i32 %idx		%elt = extractelement <64 x i32> %vec, i32 %idx
ret i32 %elt		ret i32 %elt
}		}

define i16 @v_extract_v128i16_varidx(<128 x i16> addrspace(1)* %ptr, i32 %idx) {		define i16 @v_extract_v128i16_varidx(<128 x i16> addrspace(1)* %ptr, i32 %idx) {
; GCN-LABEL: v_extract_v128i16_varidx:		; GCN-LABEL: v_extract_v128i16_varidx:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v15, v0
; GCN-NEXT: s_add_u32 s4, s32, 0x3fc0		; GCN-NEXT: s_add_u32 s4, s32, 0x3fc0
; GCN-NEXT: s_mov_b32 s5, 0		; GCN-NEXT: s_mov_b32 s5, 0
; GCN-NEXT: s_mov_b32 s6, s33		; GCN-NEXT: s_mov_b32 s6, s33
; GCN-NEXT: s_and_b32 s33, s4, 0xffffc000		; GCN-NEXT: s_and_b32 s33, s4, 0xffffc000
; GCN-NEXT: s_movk_i32 s4, 0x80		; GCN-NEXT: s_movk_i32 s4, 0x80
; GCN-NEXT: v_mov_b32_e32 v12, s5		; GCN-NEXT: v_mov_b32_e32 v12, s5
; GCN-NEXT: v_mov_b32_e32 v16, v1		; GCN-NEXT: v_mov_b32_e32 v16, v1
; GCN-NEXT: v_add_co_u32_e32 v31, vcc, 64, v15		; GCN-NEXT: v_add_co_u32_e32 v31, vcc, 64, v0
		; GCN-NEXT: v_mov_b32_e32 v15, v0
; GCN-NEXT: v_mov_b32_e32 v11, s4		; GCN-NEXT: v_mov_b32_e32 v11, s4
; GCN-NEXT: v_addc_co_u32_e32 v32, vcc, 0, v16, vcc		; GCN-NEXT: v_addc_co_u32_e32 v32, vcc, 0, v16, vcc
; GCN-NEXT: v_add_co_u32_e32 v48, vcc, v15, v11		; GCN-NEXT: v_add_co_u32_e32 v48, vcc, v15, v11
; GCN-NEXT: v_addc_co_u32_e32 v49, vcc, v16, v12, vcc		; GCN-NEXT: v_addc_co_u32_e32 v49, vcc, v16, v12, vcc
; GCN-NEXT: s_movk_i32 s4, 0xc0		; GCN-NEXT: s_movk_i32 s4, 0xc0
; GCN-NEXT: v_mov_b32_e32 v12, s5		; GCN-NEXT: v_mov_b32_e32 v12, s5
; GCN-NEXT: v_mov_b32_e32 v11, s4		; GCN-NEXT: v_mov_b32_e32 v11, s4
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:56 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 557 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll

	Show First 20 Lines • Show All 1,003 Lines • ▼ Show 20 Lines
	; MOVREL-NEXT: v_mov_b32_e32 v6, s5			; MOVREL-NEXT: v_mov_b32_e32 v6, s5
	; MOVREL-NEXT: v_mov_b32_e32 v5, s4			; MOVREL-NEXT: v_mov_b32_e32 v5, s4
	; MOVREL-NEXT: v_mov_b32_e32 v4, s3			; MOVREL-NEXT: v_mov_b32_e32 v4, s3
	; MOVREL-NEXT: v_mov_b32_e32 v3, s2			; MOVREL-NEXT: v_mov_b32_e32 v3, s2
	; MOVREL-NEXT: v_mov_b32_e32 v2, s1			; MOVREL-NEXT: v_mov_b32_e32 v2, s1
	; MOVREL-NEXT: v_mov_b32_e32 v1, s0			; MOVREL-NEXT: v_mov_b32_e32 v1, s0
	; MOVREL-NEXT: v_cmp_eq_u32_e64 s0, 1, v0			; MOVREL-NEXT: v_cmp_eq_u32_e64 s0, 1, v0
	; MOVREL-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0			; MOVREL-NEXT: v_cmp_eq_u32_e32 vcc_lo, 0, v0
	; MOVREL-NEXT: s_mov_b32 s30, s18
	; MOVREL-NEXT: s_mov_b32 s31, s19
	; MOVREL-NEXT: v_cmp_eq_u32_e64 s2, 5, v0			; MOVREL-NEXT: v_cmp_eq_u32_e64 s2, 5, v0
	; MOVREL-NEXT: v_cndmask_b32_e64 v3, v3, s30, s0
	; MOVREL-NEXT: v_cndmask_b32_e64 v4, v4, s31, s0
	; MOVREL-NEXT: v_cmp_eq_u32_e64 s0, 4, v0
	; MOVREL-NEXT: v_cmp_eq_u32_e64 s1, 2, v0			; MOVREL-NEXT: v_cmp_eq_u32_e64 s1, 2, v0
	; MOVREL-NEXT: v_cmp_eq_u32_e64 s3, 6, v0			; MOVREL-NEXT: v_cmp_eq_u32_e64 s3, 6, v0
				; MOVREL-NEXT: v_cndmask_b32_e64 v3, v3, s18, s0
				; MOVREL-NEXT: v_cndmask_b32_e64 v4, v4, s19, s0
				; MOVREL-NEXT: v_cmp_eq_u32_e64 s0, 4, v0
	; MOVREL-NEXT: v_cmp_eq_u32_e64 s4, 7, v0			; MOVREL-NEXT: v_cmp_eq_u32_e64 s4, 7, v0
	; MOVREL-NEXT: v_cndmask_b32_e64 v1, v1, s30, vcc_lo			; MOVREL-NEXT: v_cndmask_b32_e64 v1, v1, s18, vcc_lo
	; MOVREL-NEXT: v_cndmask_b32_e64 v2, v2, s31, vcc_lo			; MOVREL-NEXT: v_cndmask_b32_e64 v2, v2, s19, vcc_lo
	; MOVREL-NEXT: v_cmp_eq_u32_e32 vcc_lo, 3, v0			; MOVREL-NEXT: v_cmp_eq_u32_e32 vcc_lo, 3, v0
	; MOVREL-NEXT: v_cndmask_b32_e64 v9, v9, s30, s0			; MOVREL-NEXT: v_cndmask_b32_e64 v9, v9, s18, s0
	; MOVREL-NEXT: v_cndmask_b32_e64 v10, v10, s31, s0			; MOVREL-NEXT: v_cndmask_b32_e64 v10, v10, s19, s0
	; MOVREL-NEXT: v_cndmask_b32_e64 v11, v11, s30, s2			; MOVREL-NEXT: v_cndmask_b32_e64 v11, v11, s18, s2
	; MOVREL-NEXT: v_cndmask_b32_e64 v12, v12, s31, s2			; MOVREL-NEXT: v_cndmask_b32_e64 v12, v12, s19, s2
	; MOVREL-NEXT: v_cndmask_b32_e64 v5, v5, s30, s1			; MOVREL-NEXT: v_cndmask_b32_e64 v5, v5, s18, s1
	; MOVREL-NEXT: v_cndmask_b32_e64 v6, v6, s31, s1			; MOVREL-NEXT: v_cndmask_b32_e64 v6, v6, s19, s1
	; MOVREL-NEXT: v_cndmask_b32_e64 v7, v7, s30, vcc_lo			; MOVREL-NEXT: v_cndmask_b32_e64 v7, v7, s18, vcc_lo
	; MOVREL-NEXT: v_cndmask_b32_e64 v8, v8, s31, vcc_lo			; MOVREL-NEXT: v_cndmask_b32_e64 v8, v8, s19, vcc_lo
	; MOVREL-NEXT: v_cndmask_b32_e64 v13, v13, s30, s3			; MOVREL-NEXT: v_cndmask_b32_e64 v13, v13, s18, s3
	; MOVREL-NEXT: v_cndmask_b32_e64 v14, v14, s31, s3			; MOVREL-NEXT: v_cndmask_b32_e64 v14, v14, s19, s3
	; MOVREL-NEXT: v_cndmask_b32_e64 v15, v15, s30, s4			; MOVREL-NEXT: v_cndmask_b32_e64 v15, v15, s18, s4
	; MOVREL-NEXT: v_cndmask_b32_e64 v16, v16, s31, s4			; MOVREL-NEXT: v_cndmask_b32_e64 v16, v16, s19, s4
	; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[1:4], off			; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[1:4], off
	; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[5:8], off			; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[5:8], off
	; MOVREL-NEXT: ; implicit-def: $vcc_hi			; MOVREL-NEXT: ; implicit-def: $vcc_hi
	; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[9:12], off			; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[9:12], off
	; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[13:16], off			; MOVREL-NEXT: global_store_dwordx4 v[0:1], v[13:16], off
	; MOVREL-NEXT: s_endpgm			; MOVREL-NEXT: s_endpgm
	entry:			entry:
	%insert = insertelement <8 x double> %vec, double %val, i32 %idx			%insert = insertelement <8 x double> %vec, double %val, i32 %idx
	▲ Show 20 Lines • Show All 3,505 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.store.2d.d16.ll

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; define amdgpu_ps void @image_store_v3f16(<8 x i32> inreg %rsrc, i32 %s, i32 %t, <3 x half> %in) {			; define amdgpu_ps void @image_store_v3f16(<8 x i32> inreg %rsrc, i32 %s, i32 %t, <3 x half> %in) {
	; call void @llvm.amdgcn.image.store.2d.v3f16.i32(<3 x half> %in, i32 7, i32 %s, i32 %t, <8 x i32> %rsrc, i32 0, i32 0)			; call void @llvm.amdgcn.image.store.2d.v3f16.i32(<3 x half> %in, i32 7, i32 %s, i32 %t, <8 x i32> %rsrc, i32 0, i32 0)
	; ret void			; ret void
	; }			; }

	define amdgpu_ps void @image_store_v4f16(<8 x i32> inreg %rsrc, i32 %s, i32 %t, <4 x half> %in) {			define amdgpu_ps void @image_store_v4f16(<8 x i32> inreg %rsrc, i32 %s, i32 %t, <4 x half> %in) {
	; UNPACKED-LABEL: image_store_v4f16:			; UNPACKED-LABEL: image_store_v4f16:
	; UNPACKED: ; %bb.0:			; UNPACKED: ; %bb.0:
	; UNPACKED-NEXT: v_mov_b32_e32 v6, v1
	; UNPACKED-NEXT: v_mov_b32_e32 v1, v2
	; UNPACKED-NEXT: s_mov_b32 s0, s2			; UNPACKED-NEXT: s_mov_b32 s0, s2
	; UNPACKED-NEXT: s_mov_b32 s1, s3			; UNPACKED-NEXT: s_mov_b32 s1, s3
	; UNPACKED-NEXT: s_mov_b32 s2, s4			; UNPACKED-NEXT: s_mov_b32 s2, s4
	; UNPACKED-NEXT: s_mov_b32 s3, s5			; UNPACKED-NEXT: s_mov_b32 s3, s5
	; UNPACKED-NEXT: s_mov_b32 s4, s6			; UNPACKED-NEXT: s_mov_b32 s4, s6
	; UNPACKED-NEXT: s_mov_b32 s5, s7			; UNPACKED-NEXT: s_mov_b32 s5, s7
				; UNPACKED-NEXT: v_mov_b32_e32 v6, v1
				; UNPACKED-NEXT: v_mov_b32_e32 v1, v2
	; UNPACKED-NEXT: s_mov_b32 s6, s8			; UNPACKED-NEXT: s_mov_b32 s6, s8
	; UNPACKED-NEXT: s_mov_b32 s7, s9			; UNPACKED-NEXT: s_mov_b32 s7, s9
	; UNPACKED-NEXT: v_mov_b32_e32 v5, v0			; UNPACKED-NEXT: v_mov_b32_e32 v5, v0
	; UNPACKED-NEXT: v_lshrrev_b32_e32 v2, 16, v1			; UNPACKED-NEXT: v_lshrrev_b32_e32 v2, 16, v2
	; UNPACKED-NEXT: v_lshrrev_b32_e32 v4, 16, v3			; UNPACKED-NEXT: v_lshrrev_b32_e32 v4, 16, v3
	; UNPACKED-NEXT: image_store v[1:4], v[5:6], s[0:7] dmask:0xf unorm			; UNPACKED-NEXT: image_store v[1:4], v[5:6], s[0:7] dmask:0xf unorm
	; UNPACKED-NEXT: s_endpgm			; UNPACKED-NEXT: s_endpgm
	;			;
	; PACKED-LABEL: image_store_v4f16:			; PACKED-LABEL: image_store_v4f16:
	; PACKED: ; %bb.0:			; PACKED: ; %bb.0:
	; PACKED-NEXT: s_mov_b32 s0, s2			; PACKED-NEXT: s_mov_b32 s0, s2
	; PACKED-NEXT: s_mov_b32 s1, s3			; PACKED-NEXT: s_mov_b32 s1, s3
	Show All 18 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,112 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc			; GFX9-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc
	; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc			; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc
	; GFX9-NEXT: v_cndmask_b32_e32 v2, v6, v2, vcc			; GFX9-NEXT: v_cndmask_b32_e32 v2, v6, v2, vcc
	; GFX9-NEXT: v_cndmask_b32_e32 v3, v7, v3, vcc			; GFX9-NEXT: v_cndmask_b32_e32 v3, v7, v3, vcc
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: saddsat_i128_vs:			; GFX10-LABEL: saddsat_i128_vs:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: v_mov_b32_e32 v5, v0
	; GFX10-NEXT: v_mov_b32_e32 v6, v1			; GFX10-NEXT: v_mov_b32_e32 v6, v1
	; GFX10-NEXT: v_mov_b32_e32 v9, v2			; GFX10-NEXT: v_mov_b32_e32 v9, v2
				; GFX10-NEXT: v_add_co_u32_e64 v15, vcc_lo, v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v10, v3			; GFX10-NEXT: v_mov_b32_e32 v10, v3
	; GFX10-NEXT: s_cmp_eq_u64 s[2:3], 0			; GFX10-NEXT: v_mov_b32_e32 v5, v0
	; GFX10-NEXT: v_add_co_u32_e64 v15, vcc_lo, v5, s0
	; GFX10-NEXT: v_cmp_lt_u64_e64 s0, s[0:1], 0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v16, vcc_lo, s1, v6, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v16, vcc_lo, s1, v6, vcc_lo
	; GFX10-NEXT: s_cselect_b32 s4, 1, 0			; GFX10-NEXT: v_cmp_lt_u64_e64 s0, s[0:1], 0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v19, vcc_lo, s2, v9, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v19, vcc_lo, s2, v9, vcc_lo
	; GFX10-NEXT: s_and_b32 s1, 1, s4			; GFX10-NEXT: s_cmp_eq_u64 s[2:3], 0
	; GFX10-NEXT: v_add_co_ci_u32_e32 v20, vcc_lo, s3, v10, vcc_lo			; GFX10-NEXT: v_add_co_ci_u32_e32 v20, vcc_lo, s3, v10, vcc_lo
	; GFX10-NEXT: v_cmp_lt_u64_e32 vcc_lo, v[15:16], v[5:6]			; GFX10-NEXT: v_cmp_lt_u64_e32 vcc_lo, v[15:16], v[5:6]
				; GFX10-NEXT: s_cselect_b32 s4, 1, 0
	; GFX10-NEXT: v_cndmask_b32_e64 v8, 0, 1, s0			; GFX10-NEXT: v_cndmask_b32_e64 v8, 0, 1, s0
	; GFX10-NEXT: v_cmp_lt_i64_e64 s0, s[2:3], 0			; GFX10-NEXT: v_cmp_lt_i64_e64 s0, s[2:3], 0
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: s_and_b32 s1, 1, s4
	; GFX10-NEXT: v_ashrrev_i32_e32 v7, 31, v20			; GFX10-NEXT: v_ashrrev_i32_e32 v7, 31, v20
	; GFX10-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc_lo			; GFX10-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	; GFX10-NEXT: v_cmp_lt_i64_e32 vcc_lo, v[19:20], v[9:10]			; GFX10-NEXT: v_cmp_lt_i64_e32 vcc_lo, v[19:20], v[9:10]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc_lo			; GFX10-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc_lo
	; GFX10-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[19:20], v[9:10]			; GFX10-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[19:20], v[9:10]
	; GFX10-NEXT: v_cndmask_b32_e64 v9, 0, 1, s0			; GFX10-NEXT: v_cndmask_b32_e64 v9, 0, 1, s0
	; GFX10-NEXT: s_movk_i32 s0, 0x7f			; GFX10-NEXT: s_movk_i32 s0, 0x7f
	; GFX10-NEXT: s_sub_i32 s2, 64, s0			; GFX10-NEXT: s_sub_i32 s2, 64, s0
	; GFX10-NEXT: v_cndmask_b32_e32 v10, v1, v0, vcc_lo			; GFX10-NEXT: v_cndmask_b32_e32 v10, v1, v0, vcc_lo
	; GFX10-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s1			; GFX10-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s1
	; GFX10-NEXT: v_lshrrev_b64 v[0:1], s0, v[15:16]			; GFX10-NEXT: v_lshrrev_b64 v[0:1], s0, v[15:16]
	▲ Show 20 Lines • Show All 1,214 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,097 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc			; GFX9-NEXT: v_cndmask_b32_e32 v0, v4, v0, vcc
	; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc			; GFX9-NEXT: v_cndmask_b32_e32 v1, v5, v1, vcc
	; GFX9-NEXT: v_cndmask_b32_e32 v2, v6, v2, vcc			; GFX9-NEXT: v_cndmask_b32_e32 v2, v6, v2, vcc
	; GFX9-NEXT: v_cndmask_b32_e32 v3, v7, v3, vcc			; GFX9-NEXT: v_cndmask_b32_e32 v3, v7, v3, vcc
	; GFX9-NEXT: ; return to shader part epilog			; GFX9-NEXT: ; return to shader part epilog
	;			;
	; GFX10-LABEL: ssubsat_i128_vs:			; GFX10-LABEL: ssubsat_i128_vs:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: v_mov_b32_e32 v5, v0
	; GFX10-NEXT: v_mov_b32_e32 v6, v1			; GFX10-NEXT: v_mov_b32_e32 v6, v1
	; GFX10-NEXT: v_mov_b32_e32 v9, v2			; GFX10-NEXT: v_mov_b32_e32 v9, v2
				; GFX10-NEXT: v_sub_co_u32_e64 v15, vcc_lo, v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v10, v3			; GFX10-NEXT: v_mov_b32_e32 v10, v3
	; GFX10-NEXT: s_cmp_eq_u64 s[2:3], 0			; GFX10-NEXT: v_mov_b32_e32 v5, v0
	; GFX10-NEXT: v_sub_co_u32_e64 v15, vcc_lo, v5, s0
	; GFX10-NEXT: v_cmp_gt_u64_e64 s0, s[0:1], 0
	; GFX10-NEXT: v_subrev_co_ci_u32_e32 v16, vcc_lo, s1, v6, vcc_lo			; GFX10-NEXT: v_subrev_co_ci_u32_e32 v16, vcc_lo, s1, v6, vcc_lo
	; GFX10-NEXT: s_cselect_b32 s4, 1, 0			; GFX10-NEXT: v_cmp_gt_u64_e64 s0, s[0:1], 0
	; GFX10-NEXT: v_subrev_co_ci_u32_e32 v19, vcc_lo, s2, v9, vcc_lo			; GFX10-NEXT: v_subrev_co_ci_u32_e32 v19, vcc_lo, s2, v9, vcc_lo
	; GFX10-NEXT: s_and_b32 s1, 1, s4			; GFX10-NEXT: s_cmp_eq_u64 s[2:3], 0
	; GFX10-NEXT: v_subrev_co_ci_u32_e32 v20, vcc_lo, s3, v10, vcc_lo			; GFX10-NEXT: v_subrev_co_ci_u32_e32 v20, vcc_lo, s3, v10, vcc_lo
	; GFX10-NEXT: v_cmp_lt_u64_e32 vcc_lo, v[15:16], v[5:6]			; GFX10-NEXT: v_cmp_lt_u64_e32 vcc_lo, v[15:16], v[5:6]
				; GFX10-NEXT: s_cselect_b32 s4, 1, 0
	; GFX10-NEXT: v_cndmask_b32_e64 v8, 0, 1, s0			; GFX10-NEXT: v_cndmask_b32_e64 v8, 0, 1, s0
	; GFX10-NEXT: v_cmp_gt_i64_e64 s0, s[2:3], 0			; GFX10-NEXT: v_cmp_gt_i64_e64 s0, s[2:3], 0
	; GFX10-NEXT: ; implicit-def: $vcc_hi			; GFX10-NEXT: s_and_b32 s1, 1, s4
	; GFX10-NEXT: v_ashrrev_i32_e32 v7, 31, v20			; GFX10-NEXT: v_ashrrev_i32_e32 v7, 31, v20
	; GFX10-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc_lo			; GFX10-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc_lo
	; GFX10-NEXT: v_cmp_lt_i64_e32 vcc_lo, v[19:20], v[9:10]			; GFX10-NEXT: v_cmp_lt_i64_e32 vcc_lo, v[19:20], v[9:10]
				; GFX10-NEXT: ; implicit-def: $vcc_hi
	; GFX10-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc_lo			; GFX10-NEXT: v_cndmask_b32_e64 v1, 0, 1, vcc_lo
	; GFX10-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[19:20], v[9:10]			; GFX10-NEXT: v_cmp_eq_u64_e32 vcc_lo, v[19:20], v[9:10]
	; GFX10-NEXT: v_cndmask_b32_e64 v9, 0, 1, s0			; GFX10-NEXT: v_cndmask_b32_e64 v9, 0, 1, s0
	; GFX10-NEXT: s_movk_i32 s0, 0x7f			; GFX10-NEXT: s_movk_i32 s0, 0x7f
	; GFX10-NEXT: s_sub_i32 s2, 64, s0			; GFX10-NEXT: s_sub_i32 s2, 64, s0
	; GFX10-NEXT: v_cndmask_b32_e32 v10, v1, v0, vcc_lo			; GFX10-NEXT: v_cndmask_b32_e32 v10, v1, v0, vcc_lo
	; GFX10-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s1			; GFX10-NEXT: v_cmp_ne_u32_e64 vcc_lo, 0, s1
	; GFX10-NEXT: v_lshrrev_b64 v[0:1], s0, v[15:16]			; GFX10-NEXT: v_lshrrev_b64 v[0:1], s0, v[15:16]
	▲ Show 20 Lines • Show All 1,214 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/atomicrmw-nand.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	define i32 @atomic_nand_i32_lds(i32 addrspace(3)* %ptr) nounwind {			define i32 @atomic_nand_i32_lds(i32 addrspace(3)* %ptr) nounwind {
	; GCN-LABEL: atomic_nand_i32_lds:			; GCN-LABEL: atomic_nand_i32_lds:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: ds_read_b32 v1, v0			; GCN-NEXT: ds_read_b32 v1, v0
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: BB0_1: ; %atomicrmw.start			; GCN-NEXT: BB0_1: ; %atomicrmw.start
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v2, v1			; GCN-NEXT: v_mov_b32_e32 v2, v1
	; GCN-NEXT: v_not_b32_e32 v1, v2			; GCN-NEXT: v_not_b32_e32 v1, v1
	; GCN-NEXT: v_or_b32_e32 v1, -5, v1			; GCN-NEXT: v_or_b32_e32 v1, -5, v1
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: ds_cmpst_rtn_b32 v1, v0, v2, v1			; GCN-NEXT: ds_cmpst_rtn_b32 v1, v0, v2, v1
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: buffer_wbinvl1_vol			; GCN-NEXT: buffer_wbinvl1_vol
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v1, v2			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v1, v2
	; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]			; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]			; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]
	Show All 11 Lines
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: global_load_dword v2, v[0:1], off			; GCN-NEXT: global_load_dword v2, v[0:1], off
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: BB1_1: ; %atomicrmw.start			; GCN-NEXT: BB1_1: ; %atomicrmw.start
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v3, v2			; GCN-NEXT: v_mov_b32_e32 v3, v2
	; GCN-NEXT: v_not_b32_e32 v2, v3			; GCN-NEXT: v_not_b32_e32 v2, v2
	; GCN-NEXT: v_or_b32_e32 v2, -5, v2			; GCN-NEXT: v_or_b32_e32 v2, -5, v2
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc			; GCN-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: buffer_wbinvl1_vol			; GCN-NEXT: buffer_wbinvl1_vol
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
	; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]			; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]			; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]
	Show All 11 Lines
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: flat_load_dword v2, v[0:1]			; GCN-NEXT: flat_load_dword v2, v[0:1]
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: BB2_1: ; %atomicrmw.start			; GCN-NEXT: BB2_1: ; %atomicrmw.start
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v3, v2			; GCN-NEXT: v_mov_b32_e32 v3, v2
	; GCN-NEXT: v_not_b32_e32 v2, v3			; GCN-NEXT: v_not_b32_e32 v2, v2
	; GCN-NEXT: v_or_b32_e32 v2, -5, v2			; GCN-NEXT: v_or_b32_e32 v2, -5, v2
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc			; GCN-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: buffer_wbinvl1_vol			; GCN-NEXT: buffer_wbinvl1_vol
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
	; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]			; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/ds-combine-large-stride.ll

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	bb:
%tmp22 = load float, float addrspace(3)* %tmp21, align 4		%tmp22 = load float, float addrspace(3)* %tmp21, align 4
%tmp23 = fadd float %tmp20, %tmp22		%tmp23 = fadd float %tmp20, %tmp22
store float %tmp23, float *%arg1, align 4		store float %tmp23, float *%arg1, align 4
ret void		ret void
}		}

; GCN-LABEL: ds_read32_combine_stride_8192_shifted:		; GCN-LABEL: ds_read32_combine_stride_8192_shifted:
; GCN: s_load_dword [[ARG:s[0-9]+]], s[4:5], 0x0		; GCN: s_load_dword [[ARG:s[0-9]+]], s[4:5], 0x0
; GCN: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]

; VI-DAG: v_add_u32_e32 [[B1:v[0-9]+]], vcc, 8, [[BASE]]		; VI-DAG: v_add_u32_e64 [[B1:v[0-9]+]], vcc, [[ARG]], 8
		; VI-DAG: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]		; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]
; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]		; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]

		; GFX9: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 8, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 8, [[BASE]]
; GFX9-DAG: v_add_u32_e32 [[B2:v[0-9]+]], 0x4008, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B2:v[0-9]+]], 0x4008, [[BASE]]
; GFX9-DAG: v_add_u32_e32 [[B3:v[0-9]+]], 0x8008, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B3:v[0-9]+]], 0x8008, [[BASE]]

; GCN-DAG: ds_read2st64_b32 v[{{[0-9]+:[0-9]+}}], [[B1]] offset1:32		; GCN-DAG: ds_read2st64_b32 v[{{[0-9]+:[0-9]+}}], [[B1]] offset1:32
; GCN-DAG: ds_read2st64_b32 v[{{[0-9]+:[0-9]+}}], [[B2]] offset1:32		; GCN-DAG: ds_read2st64_b32 v[{{[0-9]+:[0-9]+}}], [[B2]] offset1:32
; GCN-DAG: ds_read2st64_b32 v[{{[0-9]+:[0-9]+}}], [[B3]] offset1:32		; GCN-DAG: ds_read2st64_b32 v[{{[0-9]+:[0-9]+}}], [[B3]] offset1:32
define amdgpu_kernel void @ds_read32_combine_stride_8192_shifted(float addrspace(3)* nocapture readonly %arg, float *nocapture %arg1) {		define amdgpu_kernel void @ds_read32_combine_stride_8192_shifted(float addrspace(3)* nocapture readonly %arg, float *nocapture %arg1) {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	bb:
%tmp22 = load double, double addrspace(3)* %tmp21, align 8		%tmp22 = load double, double addrspace(3)* %tmp21, align 8
%tmp23 = fadd double %tmp20, %tmp22		%tmp23 = fadd double %tmp20, %tmp22
store double %tmp23, double *%arg1, align 8		store double %tmp23, double *%arg1, align 8
ret void		ret void
}		}

; GCN-LABEL: ds_read64_combine_stride_8192_shifted:		; GCN-LABEL: ds_read64_combine_stride_8192_shifted:
; GCN: s_load_dword [[ARG:s[0-9]+]], s[4:5], 0x0		; GCN: s_load_dword [[ARG:s[0-9]+]], s[4:5], 0x0
; GCN: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]

; VI-DAG: v_add_u32_e32 [[B1:v[0-9]+]], vcc, 8, [[BASE]]		; VI-DAG: v_add_u32_e64 [[B1:v[0-9]+]], vcc, [[ARG]], 8
		; VI-DAG: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]		; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]
; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]		; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]

		; GFX9: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 8, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 8, [[BASE]]
; GFX9-DAG: v_add_u32_e32 [[B2:v[0-9]+]], 0x4008, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B2:v[0-9]+]], 0x4008, [[BASE]]
; GFX9-DAG: v_add_u32_e32 [[B3:v[0-9]+]], 0x8008, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B3:v[0-9]+]], 0x8008, [[BASE]]

; GCN-DAG: ds_read2st64_b64 v[{{[0-9]+:[0-9]+}}], [[B1]] offset1:16		; GCN-DAG: ds_read2st64_b64 v[{{[0-9]+:[0-9]+}}], [[B1]] offset1:16
; GCN-DAG: ds_read2st64_b64 v[{{[0-9]+:[0-9]+}}], [[B2]] offset1:16		; GCN-DAG: ds_read2st64_b64 v[{{[0-9]+:[0-9]+}}], [[B2]] offset1:16
; GCN-DAG: ds_read2st64_b64 v[{{[0-9]+:[0-9]+}}], [[B3]] offset1:16		; GCN-DAG: ds_read2st64_b64 v[{{[0-9]+:[0-9]+}}], [[B3]] offset1:16
define amdgpu_kernel void @ds_read64_combine_stride_8192_shifted(double addrspace(3)* nocapture readonly %arg, double *nocapture %arg1) {		define amdgpu_kernel void @ds_read64_combine_stride_8192_shifted(double addrspace(3)* nocapture readonly %arg, double *nocapture %arg1) {
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	bb:
store float 1.000000e+00, float addrspace(3)* %tmp5, align 4		store float 1.000000e+00, float addrspace(3)* %tmp5, align 4
%tmp6 = getelementptr inbounds float, float addrspace(3)* %arg, i32 14336		%tmp6 = getelementptr inbounds float, float addrspace(3)* %arg, i32 14336
store float 1.000000e+00, float addrspace(3)* %tmp6, align 4		store float 1.000000e+00, float addrspace(3)* %tmp6, align 4
ret void		ret void
}		}

; GCN-LABEL: ds_write32_combine_stride_8192_shifted:		; GCN-LABEL: ds_write32_combine_stride_8192_shifted:
; GCN: s_load_dword [[ARG:s[0-9]+]], s[4:5], 0x0		; GCN: s_load_dword [[ARG:s[0-9]+]], s[4:5], 0x0
; GCN: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]

; VI-DAG: v_add_u32_e32 [[B1:v[0-9]+]], vcc, 4, [[BASE]]		; VI-DAG: v_add_u32_e64 [[B1:v[0-9]+]], vcc, [[ARG]], 4
		; VI-DAG: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]		; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]
; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]		; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]

		; GFX9: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 4, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 4, [[BASE]]
; GFX9-DAG: v_add_u32_e32 [[B2:v[0-9]+]], 0x4004, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B2:v[0-9]+]], 0x4004, [[BASE]]
; GFX9-DAG: v_add_u32_e32 [[B3:v[0-9]+]], 0x8004, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B3:v[0-9]+]], 0x8004, [[BASE]]

; GCN-DAG: ds_write2st64_b32 [[B1]], v{{[0-9]+}}, v{{[0-9]+}} offset1:32		; GCN-DAG: ds_write2st64_b32 [[B1]], v{{[0-9]+}}, v{{[0-9]+}} offset1:32
; GCN-DAG: ds_write2st64_b32 [[B2]], v{{[0-9]+}}, v{{[0-9]+}} offset1:32		; GCN-DAG: ds_write2st64_b32 [[B2]], v{{[0-9]+}}, v{{[0-9]+}} offset1:32
; GCN-DAG: ds_write2st64_b32 [[B3]], v{{[0-9]+}}, v{{[0-9]+}} offset1:32		; GCN-DAG: ds_write2st64_b32 [[B3]], v{{[0-9]+}}, v{{[0-9]+}} offset1:32
define amdgpu_kernel void @ds_write32_combine_stride_8192_shifted(float addrspace(3)* nocapture %arg) {		define amdgpu_kernel void @ds_write32_combine_stride_8192_shifted(float addrspace(3)* nocapture %arg) {
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bb:
store double 1.000000e+00, double addrspace(3)* %tmp5, align 8		store double 1.000000e+00, double addrspace(3)* %tmp5, align 8
%tmp6 = getelementptr inbounds double, double addrspace(3)* %arg, i32 350		%tmp6 = getelementptr inbounds double, double addrspace(3)* %arg, i32 350
store double 1.000000e+00, double addrspace(3)* %tmp6, align 8		store double 1.000000e+00, double addrspace(3)* %tmp6, align 8
ret void		ret void
}		}

; GCN-LABEL: ds_write64_combine_stride_8192_shifted:		; GCN-LABEL: ds_write64_combine_stride_8192_shifted:
; GCN: s_load_dword [[ARG:s[0-9]+]], s[4:5], 0x0		; GCN: s_load_dword [[ARG:s[0-9]+]], s[4:5], 0x0
; GCN: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]

; VI-DAG: v_add_u32_e32 [[B1:v[0-9]+]], vcc, 8, [[BASE]]		; VI-DAG: v_add_u32_e64 [[B1:v[0-9]+]], vcc, [[ARG]], 8
		; VI-DAG: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]		; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]
; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]		; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]

		; GFX9: v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 8, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 8, [[BASE]]
; GFX9-DAG: v_add_u32_e32 [[B2:v[0-9]+]], 0x4008, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B2:v[0-9]+]], 0x4008, [[BASE]]
; GFX9-DAG: v_add_u32_e32 [[B3:v[0-9]+]], 0x8008, [[BASE]]		; GFX9-DAG: v_add_u32_e32 [[B3:v[0-9]+]], 0x8008, [[BASE]]

; GCN-DAG: ds_write2st64_b64 [[B1]], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] offset1:16		; GCN-DAG: ds_write2st64_b64 [[B1]], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] offset1:16
; GCN-DAG: ds_write2st64_b64 [[B2]], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] offset1:16		; GCN-DAG: ds_write2st64_b64 [[B2]], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] offset1:16
; GCN-DAG: ds_write2st64_b64 [[B3]], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] offset1:16		; GCN-DAG: ds_write2st64_b64 [[B3]], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}] offset1:16
define amdgpu_kernel void @ds_write64_combine_stride_8192_shifted(double addrspace(3)* nocapture %arg) {		define amdgpu_kernel void @ds_write64_combine_stride_8192_shifted(double addrspace(3)* nocapture %arg) {
Show All 15 Lines

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @extract_w_offset_vgpr(i32 addrspace(1)* %out) {
; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.3, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.3, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.3, align 4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr2_sgpr3, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr2_sgpr3, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)
; GCN: bb.1:		; GCN: bb.1:
; GCN: successors: %bb.1(0x40000000), %bb.3(0x40000000)		; GCN: successors: %bb.1(0x40000000), %bb.3(0x40000000)
; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (load 8 from %stack.5, align 4, addrspace 5)		; GCN: $sgpr0_sgpr1 = SI_SPILL_S64_RESTORE %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (load 8 from %stack.5, align 4, addrspace 5)
; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)		; GCN: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)
; GCN: $vgpr1 = SI_SPILL_V32_RESTORE %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)		; GCN: $vgpr1 = SI_SPILL_V32_RESTORE %stack.0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
; GCN: renamable $sgpr2 = V_READFIRSTLANE_B32 $vgpr1, implicit $exec		; GCN: renamable $sgpr2 = V_READFIRSTLANE_B32 renamable $vgpr1, implicit $exec
; GCN: renamable $sgpr4_sgpr5 = V_CMP_EQ_U32_e64 $sgpr2, killed $vgpr1, implicit $exec		; GCN: renamable $sgpr4_sgpr5 = V_CMP_EQ_U32_e64 renamable $sgpr2, killed renamable $vgpr1, implicit $exec
; GCN: renamable $sgpr4_sgpr5 = S_AND_SAVEEXEC_B64 killed renamable $sgpr4_sgpr5, implicit-def $exec, implicit-def $scc, implicit $exec		; GCN: renamable $sgpr4_sgpr5 = S_AND_SAVEEXEC_B64 killed renamable $sgpr4_sgpr5, implicit-def $exec, implicit-def $scc, implicit $exec
; GCN: S_SET_GPR_IDX_ON killed renamable $sgpr2, 1, implicit-def $m0, implicit-def undef $mode, implicit $m0, implicit $mode		; GCN: S_SET_GPR_IDX_ON killed renamable $sgpr2, 1, implicit-def $m0, implicit-def undef $mode, implicit $m0, implicit $mode
; GCN: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = SI_SPILL_V512_RESTORE %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 64 from %stack.2, align 4, addrspace 5)		; GCN: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = SI_SPILL_V512_RESTORE %stack.2, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (load 64 from %stack.2, align 4, addrspace 5)
; GCN: renamable $vgpr18 = V_MOV_B32_e32 $vgpr3, implicit $exec, implicit killed $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17, implicit $m0		; GCN: renamable $vgpr18 = V_MOV_B32_e32 renamable $vgpr3, implicit $exec, implicit killed renamable $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17, implicit $m0
; GCN: S_SET_GPR_IDX_OFF implicit-def $mode, implicit $mode		; GCN: S_SET_GPR_IDX_OFF implicit-def $mode, implicit $mode
; GCN: renamable $vgpr19 = COPY renamable $vgpr18		; GCN: renamable $vgpr19 = COPY renamable $vgpr18
; GCN: renamable $sgpr2_sgpr3 = COPY renamable $sgpr4_sgpr5		; GCN: renamable $sgpr2_sgpr3 = COPY renamable $sgpr4_sgpr5
; GCN: SI_SPILL_S64_SAVE killed $sgpr2_sgpr3, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr2_sgpr3, %stack.5, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.5, align 4, addrspace 5)
; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.6, align 4, addrspace 5)		; GCN: SI_SPILL_S64_SAVE killed $sgpr0_sgpr1, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32 :: (store 8 into %stack.6, align 4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr19, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr19, %stack.4, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.7, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr0, %stack.7, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)
; GCN: SI_SPILL_V32_SAVE killed $vgpr18, %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.8, addrspace 5)		; GCN: SI_SPILL_V32_SAVE killed $vgpr18, %stack.8, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr32, 0, implicit $exec :: (store 4 into %stack.8, addrspace 5)
Show All 18 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.ordered.swap.ll

Show All 31 Lines	entry:
%c = icmp ne i32 %value, 0		%c = icmp ne i32 %value, 0
br i1 %c, label %if-true, label %endif		br i1 %c, label %if-true, label %endif

if-true:		if-true:
%val = call i32@llvm.amdgcn.ds.ordered.swap(i32 addrspace(2)* %gds, i32 %value, i32 0, i32 0, i1 false, i32 1, i1 true, i1 true)		%val = call i32@llvm.amdgcn.ds.ordered.swap(i32 addrspace(2)* %gds, i32 %value, i32 0, i32 0, i1 false, i32 1, i1 true, i1 true)
br label %endif		br label %endif

endif:		endif:
%v = phi i32 [ %val, %if-true ], [ undef, %entry ]		%v = phi i32 [ %val, %if-true ], [ %value, %entry ]
%r = bitcast i32 %v to float		%r = bitcast i32 %v to float
ret float %r		ret float %r
}		}

declare i32 @llvm.amdgcn.ds.ordered.swap(i32 addrspace(2)* nocapture, i32, i32, i32, i1, i32, i1, i1)		declare i32 @llvm.amdgcn.ds.ordered.swap(i32 addrspace(2)* nocapture, i32, i32, i32, i1, i32, i1, i1)

llvm/test/CodeGen/AMDGPU/machine-cp-cndmask.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -start-before=greedy -stop-after=machine-cp -verify-machineinstrs -o - %s \| FileCheck %s

				---
				name: remove_copy_cndmask
				tracksRegLiveness: true
				registers:
				- { id: 0, class: sreg_64 }
				- { id: 1, class: sreg_64_xexec }
				- { id: 2, class: vgpr_32 }
				- { id: 3, class: vgpr_32 }
				- { id: 4, class: sgpr_256 }
				- { id: 6, class: sgpr_128 }
				- { id: 7, class: vgpr_32 }
				- { id: 8, class: sreg_64 }
				body: \|
				bb.0.entry:
				; CHECK-LABEL: name: remove_copy_cndmask
				; CHECK: renamable $sgpr2_sgpr3 = COPY $exec
				; CHECK: renamable $sgpr0 = S_MOV_B32 0
				; CHECK: $exec = S_WQM_B64 $exec, implicit-def $scc
				; CHECK: renamable $vgpr0 = V_CNDMASK_B32_e64 0, 0, 0, 1, $sgpr2_sgpr3, implicit $exec
				; CHECK: $exec = S_AND_B64 $exec, killed renamable $sgpr2_sgpr3, implicit-def $scc
				; CHECK: renamable $sgpr1 = COPY renamable $sgpr0
				; CHECK: renamable $sgpr2 = COPY renamable $sgpr0
				; CHECK: renamable $sgpr3 = COPY renamable $sgpr0
				; CHECK: renamable $sgpr4 = COPY renamable $sgpr0
				; CHECK: renamable $sgpr5 = COPY renamable $sgpr0
				; CHECK: renamable $sgpr6 = COPY renamable $sgpr0
				; CHECK: renamable $sgpr7 = COPY renamable $sgpr0
				; CHECK: renamable $vgpr0 = IMAGE_SAMPLE_V1_V1 killed renamable $vgpr0, renamable $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7, undef renamable $sgpr0_sgpr1_sgpr2_sgpr3, 1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16)
				; CHECK: SI_RETURN_TO_EPILOG $vgpr0
				%8:sreg_64 = COPY $exec
				undef %4.sub0:sgpr_256 = S_MOV_B32 0
				%1:sreg_64_xexec = COPY %8:sreg_64
				$exec = S_WQM_B64 $exec, implicit-def $scc
				%2:vgpr_32 = V_CNDMASK_B32_e64 0, 0, 0, 1, %1:sreg_64_xexec, implicit $exec
				$exec = S_AND_B64 $exec, %8:sreg_64, implicit-def $scc
				%4.sub1:sgpr_256 = COPY %4.sub0:sgpr_256
				%4.sub2:sgpr_256 = COPY %4.sub0:sgpr_256
				%4.sub3:sgpr_256 = COPY %4.sub0:sgpr_256
				%4.sub4:sgpr_256 = COPY %4.sub0:sgpr_256
				%4.sub5:sgpr_256 = COPY %4.sub0:sgpr_256
				%4.sub6:sgpr_256 = COPY %4.sub0:sgpr_256
				%4.sub7:sgpr_256 = COPY %4.sub0:sgpr_256
				%7:vgpr_32 = IMAGE_SAMPLE_V1_V1 %2:vgpr_32, %4:sgpr_256, undef %6:sgpr_128, 1, 0, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 16)
				$vgpr0 = COPY %7:vgpr_32
				SI_RETURN_TO_EPILOG killed $vgpr0
				...

llvm/test/CodeGen/AMDGPU/multilevel-break.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_and_b64 s[8:9], exec, s[6:7]			; GCN-NEXT: s_and_b64 s[8:9], exec, s[6:7]
	; GCN-NEXT: s_or_b64 s[4:5], s[8:9], s[4:5]			; GCN-NEXT: s_or_b64 s[4:5], s[8:9], s[4:5]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]			; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]
	; GCN-NEXT: s_cbranch_execz BB0_1			; GCN-NEXT: s_cbranch_execz BB0_1
	; GCN-NEXT: BB0_4: ; %LOOP			; GCN-NEXT: BB0_4: ; %LOOP
	; GCN-NEXT: ; Parent Loop BB0_2 Depth=1			; GCN-NEXT: ; Parent Loop BB0_2 Depth=1
	; GCN-NEXT: ; => This Inner Loop Header: Depth=2			; GCN-NEXT: ; => This Inner Loop Header: Depth=2
	; GCN-NEXT: v_mov_b32_e32 v1, v0			; GCN-NEXT: v_mov_b32_e32 v1, v0
	; GCN-NEXT: v_add_i32_e32 v0, vcc, 1, v1			; GCN-NEXT: v_add_i32_e32 v0, vcc, 1, v0
	; GCN-NEXT: v_cmp_lt_i32_e32 vcc, v1, v4			; GCN-NEXT: v_cmp_lt_i32_e32 vcc, v1, v4
	; GCN-NEXT: s_or_b64 s[2:3], s[2:3], exec			; GCN-NEXT: s_or_b64 s[2:3], s[2:3], exec
	; GCN-NEXT: s_or_b64 s[6:7], s[6:7], exec			; GCN-NEXT: s_or_b64 s[6:7], s[6:7], exec
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc			; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execz BB0_3			; GCN-NEXT: s_cbranch_execz BB0_3
	; GCN-NEXT: ; %bb.5: ; %ENDIF			; GCN-NEXT: ; %bb.5: ; %ENDIF
	; GCN-NEXT: ; in Loop: Header=BB0_4 Depth=2			; GCN-NEXT: ; in Loop: Header=BB0_4 Depth=2
	; GCN-NEXT: v_cmp_ne_u32_e32 vcc, v5, v0			; GCN-NEXT: v_cmp_ne_u32_e32 vcc, v5, v0
	▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/regbank-reassign-wave64.mir

	# RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-WavefrontSize32,+WavefrontSize64 -verify-machineinstrs -run-pass greedy,amdgpu-regbanks-reassign,virtregrewriter -o - %s \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-WavefrontSize32,+WavefrontSize64 -verify-machineinstrs -run-pass greedy,amdgpu-regbanks-reassign,virtregrewriter,si-fix-renamable-flags -o - %s \| FileCheck -check-prefix=GCN %s


	# Test that subreg reassignments are correctly handled when whole register also			# Test that subreg reassignments are correctly handled when whole register also
	# conflicts. If this is mishandled stall counts will be incorrect and cause an			# conflicts. If this is mishandled stall counts will be incorrect and cause an
	# infinite loop.			# infinite loop.
	# GCN-LABEL: vgpr64_mixed_use{{$}}			# GCN-LABEL: vgpr64_mixed_use{{$}}
	# GCN: $vgpr0_vgpr1 = IMPLICIT_DEF			# GCN: $vgpr0_vgpr1 = IMPLICIT_DEF
	# GCN: $vgpr4_vgpr5 = IMPLICIT_DEF			# GCN: $vgpr4_vgpr5 = IMPLICIT_DEF
	# GCN: $vcc = IMPLICIT_DEF			# GCN: $vcc = IMPLICIT_DEF
	# GCN: $vgpr2_vgpr3 = IMPLICIT_DEF			# GCN: $vgpr2_vgpr3 = IMPLICIT_DEF
	# GCN: $vgpr6_vgpr7 = IMPLICIT_DEF			# GCN: $vgpr6_vgpr7 = IMPLICIT_DEF
	# GCN: $vgpr8_vgpr9_vgpr10_vgpr11 = IMPLICIT_DEF			# GCN: $vgpr8_vgpr9_vgpr10_vgpr11 = IMPLICIT_DEF
	# GCN: $vgpr12_vgpr13_vgpr14_vgpr15 = IMPLICIT_DEF			# GCN: $vgpr12_vgpr13_vgpr14_vgpr15 = IMPLICIT_DEF
	# GCN: $vgpr16_vgpr17_vgpr18_vgpr19 = IMPLICIT_DEF			# GCN: $vgpr16_vgpr17_vgpr18_vgpr19 = IMPLICIT_DEF
	# GCN: $vgpr20_vgpr21_vgpr22_vgpr23 = IMPLICIT_DEF			# GCN: $vgpr20_vgpr21_vgpr22_vgpr23 = IMPLICIT_DEF
	# GCN: $vgpr24_vgpr25_vgpr26_vgpr27 = IMPLICIT_DEF			# GCN: $vgpr24_vgpr25_vgpr26_vgpr27 = IMPLICIT_DEF
	# GCN: $vgpr28_vgpr29_vgpr30_vgpr31 = IMPLICIT_DEF			# GCN: $vgpr28_vgpr29_vgpr30_vgpr31 = IMPLICIT_DEF
	# GCN: $vgpr32_vgpr33_vgpr34_vgpr35 = IMPLICIT_DEF			# GCN: $vgpr32_vgpr33_vgpr34_vgpr35 = IMPLICIT_DEF
	# GCN: $vgpr36_vgpr37_vgpr38_vgpr39 = IMPLICIT_DEF			# GCN: $vgpr36_vgpr37_vgpr38_vgpr39 = IMPLICIT_DEF
	# GCN: $vgpr40_vgpr41_vgpr42_vgpr43 = IMPLICIT_DEF			# GCN: $vgpr40_vgpr41_vgpr42_vgpr43 = IMPLICIT_DEF
	# GCN: $vgpr44_vgpr45_vgpr46_vgpr47 = IMPLICIT_DEF			# GCN: $vgpr44_vgpr45_vgpr46_vgpr47 = IMPLICIT_DEF
	# GCN: $vgpr2 = V_CNDMASK_B32_e64 0, $vgpr1, 0, $vgpr5, $vcc, implicit $exec			# GCN: $vgpr2 = V_CNDMASK_B32_e64 0, $vgpr1, 0, $vgpr5, renamable $vcc, implicit $exec
	# GCN: $vgpr2 = V_CNDMASK_B32_e64 0, $vgpr0, 0, $vgpr4, killed $vcc, implicit $exec			# GCN: $vgpr2 = V_CNDMASK_B32_e64 0, $vgpr0, 0, $vgpr4, killed renamable $vcc, implicit $exec
	# GCN: $sgpr0_sgpr1 = V_CMP_LT_U64_e64 $vgpr4_vgpr5, $vgpr0_vgpr1, implicit $exec			# GCN: $sgpr0_sgpr1 = V_CMP_LT_U64_e64 $vgpr4_vgpr5, $vgpr0_vgpr1, implicit $exec
	---			---
	name: vgpr64_mixed_use			name: vgpr64_mixed_use
	tracksRegLiveness: true			tracksRegLiveness: true
	registers:			registers:
	- { id: 0, class: vreg_64, preferred-register: '$vgpr0_vgpr1' }			- { id: 0, class: vreg_64, preferred-register: '$vgpr0_vgpr1' }
	- { id: 1, class: vreg_64, preferred-register: '$vgpr4_vgpr5' }			- { id: 1, class: vreg_64, preferred-register: '$vgpr4_vgpr5' }
	- { id: 2, class: sreg_64_xexec, preferred-register: '$vcc' }			- { id: 2, class: sreg_64_xexec, preferred-register: '$vcc' }
	Show All 37 Lines

llvm/test/CodeGen/AMDGPU/regbank-reassign.mir

# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass greedy,amdgpu-regbanks-reassign,virtregrewriter -o - %s \| FileCheck -check-prefix=GCN %s		# RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -run-pass greedy,amdgpu-regbanks-reassign,virtregrewriter,si-fix-renamable-flags -o - %s \| FileCheck -check-prefix=GCN %s

# GCN-LABEL: v1_vs_v5{{$}}		# GCN-LABEL: v1_vs_v5{{$}}
# GCN: V_AND_B32_e32 killed $vgpr3, killed $vgpr1,		# GCN: V_AND_B32_e32 killed $vgpr3, killed $vgpr1,
---		---
name: v1_vs_v5		name: v1_vs_v5
tracksRegLiveness: true		tracksRegLiveness: true
registers:		registers:
- { id: 0, class: vgpr_32, preferred-register: '$vgpr1' }		- { id: 0, class: vgpr_32, preferred-register: '$vgpr1' }
Show All 35 Lines	body: \|
bb.0:		bb.0:
%0 = IMPLICIT_DEF		%0 = IMPLICIT_DEF
%1 = IMPLICIT_DEF		%1 = IMPLICIT_DEF
GLOBAL_STORE_DWORDX2 %1, %0, 0, 0, 0, 0, implicit $exec		GLOBAL_STORE_DWORDX2 %1, %0, 0, 0, 0, 0, implicit $exec
S_ENDPGM 0		S_ENDPGM 0
...		...

# GCN-LABEL: s11_vs_vcc{{$}}		# GCN-LABEL: s11_vs_vcc{{$}}
# GCN: $vgpr0, $vcc_lo = V_ADDC_U32_e64 killed $sgpr14, killed $vgpr0, killed $vcc_lo, 0		# GCN: $vgpr0, $vcc_lo = V_ADDC_U32_e64 killed renamable $sgpr14, killed $vgpr0, killed $vcc_lo, 0
---		---
name: s11_vs_vcc		name: s11_vs_vcc
tracksRegLiveness: true		tracksRegLiveness: true
registers:		registers:
- { id: 0, class: sgpr_32, preferred-register: '$sgpr11' }		- { id: 0, class: sgpr_32, preferred-register: '$sgpr11' }
- { id: 1, class: vgpr_32 }		- { id: 1, class: vgpr_32 }
- { id: 2, class: vgpr_32 }		- { id: 2, class: vgpr_32 }
body: \|		body: \|
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	bb.0:
%1 = IMPLICIT_DEF		%1 = IMPLICIT_DEF
%2 = V_AND_B32_e32 %1, %0, implicit $exec		%2 = V_AND_B32_e32 %1, %0, implicit $exec
$vgpr0 = COPY %0		$vgpr0 = COPY %0
$vgpr4 = COPY %1		$vgpr4 = COPY %1
S_ENDPGM 0		S_ENDPGM 0
...		...

# GCN-LABEL: implicit{{$}}		# GCN-LABEL: implicit{{$}}
# GCN: V_MOV_B32_indirect undef $vgpr4, undef $vgpr0, implicit $exec, implicit-def dead renamable $vgpr0_vgpr1_vgpr2_vgpr3, implicit killed $vgpr4_vgpr5_vgpr6_vgpr7, implicit $m0		# GCN: V_MOV_B32_indirect undef $vgpr4, undef $vgpr0, implicit $exec, implicit-def dead $vgpr0_vgpr1_vgpr2_vgpr3, implicit killed $vgpr4_vgpr5_vgpr6_vgpr7, implicit $m0
---		---
name: implicit		name: implicit
tracksRegLiveness: true		tracksRegLiveness: true
registers:		registers:
- { id: 0, class: vreg_128 }		- { id: 0, class: vreg_128 }
- { id: 1, class: vreg_128, preferred-register: '$vgpr4_vgpr5_vgpr6_vgpr7' }		- { id: 1, class: vreg_128, preferred-register: '$vgpr4_vgpr5_vgpr6_vgpr7' }
body: \|		body: \|
bb.0:		bb.0:
▲ Show 20 Lines • Show All 376 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/ret.ll

; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

; GCN-LABEL: {{^}}vgpr:		; GCN-LABEL: {{^}}vgpr:
; GCN-DAG: v_mov_b32_e32 v1, v0		; GCN-DAG: v_mov_b32_e32 v1, v0
; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm		; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm
; GCN: s_waitcnt expcnt(0)		; GCN: s_waitcnt expcnt(0)
; GCN: v_add_f32_e32 v0, 1.0, v1		; GCN: v_add_f32_e32 v0, 1.0, v{{[0-1]}}
; GCN-NOT: s_endpgm		; GCN-NOT: s_endpgm
define amdgpu_vs { float, float } @vgpr([9 x <16 x i8>] addrspace(4)* inreg %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {		define amdgpu_vs { float, float } @vgpr([9 x <16 x i8>] addrspace(4)* inreg %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
bb:		bb:
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %arg3, float %arg3, float %arg3, float %arg3, i1 true, i1 true) #0		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %arg3, float %arg3, float %arg3, float %arg3, i1 true, i1 true) #0
%x = fadd float %arg3, 1.000000e+00		%x = fadd float %arg3, 1.000000e+00
%a = insertvalue { float, float } undef, float %x, 0		%a = insertvalue { float, float } undef, float %x, 0
%b = insertvalue { float, float } %a, float %arg3, 1		%b = insertvalue { float, float } %a, float %arg3, 1
ret { float, float } %b		ret { float, float } %b
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	bb:
ret { i32, i32, i32, i32 } { i32 5, i32 6, i32 7, i32 8 }		ret { i32, i32, i32, i32 } { i32 5, i32 6, i32 7, i32 8 }
}		}

; GCN-LABEL: {{^}}both:		; GCN-LABEL: {{^}}both:
; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm		; GCN-DAG: exp mrt0 v0, v0, v0, v0 done vm
; GCN-DAG: v_mov_b32_e32 v1, v0		; GCN-DAG: v_mov_b32_e32 v1, v0
; GCN-DAG: s_mov_b32 s1, s2		; GCN-DAG: s_mov_b32 s1, s2
; GCN-DAG: s_waitcnt expcnt(0)		; GCN-DAG: s_waitcnt expcnt(0)
; GCN-DAG: v_add_f32_e32 v0, 1.0, v1		; GCN-DAG: v_add_f32_e32 v0, 1.0, v{{[0-1]}}
; GCN-DAG: s_add_{{i\|u}}32 s0, s3, 2		; GCN-DAG: s_add_{{i\|u}}32 s0, s3, 2
; GCN-DAG: s_mov_b32 s2, s3		; GCN-DAG: s_mov_b32 s2, s3
; GCN-NOT: s_endpgm		; GCN-NOT: s_endpgm
define amdgpu_vs { float, i32, float, i32, i32 } @both([9 x <16 x i8>] addrspace(4)* inreg %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {		define amdgpu_vs { float, i32, float, i32, i32 } @both([9 x <16 x i8>] addrspace(4)* inreg %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
bb:		bb:
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %arg3, float %arg3, float %arg3, float %arg3, i1 true, i1 true) #0		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %arg3, float %arg3, float %arg3, float %arg3, i1 true, i1 true) #0
%v = fadd float %arg3, 1.000000e+00		%v = fadd float %arg3, 1.000000e+00
%s = add i32 %arg2, 2		%s = add i32 %arg2, 2
Show All 36 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir

	Show All 28 Lines
	# SHARE: - { id: 1, name: '', type: spill-slot, offset: 0, size: 8, alignment: 4,			# SHARE: - { id: 1, name: '', type: spill-slot, offset: 0, size: 8, alignment: 4,
	# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# SHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# SHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# SHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# SHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

	# SHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)			# SHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)
	# SHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# SHARE: SI_SPILL_V32_SAVE killed renamable $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# SHARE: SI_SPILL_S64_SAVE killed renamable $sgpr4_sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)			# SHARE: SI_SPILL_S64_SAVE killed renamable $sgpr4_sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)
	# SHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# SHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0			# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0
	# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)
	# SHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# SHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	# SHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# SHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0			# SHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0
	# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# SHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)

	# NOSHARE: stack:			# NOSHARE: stack:
	# NOSHARE: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: default, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 1, name: '', type: spill-slot, offset: 0, size: 8, alignment: 4,			# NOSHARE: - { id: 1, name: '', type: spill-slot, offset: 0, size: 8, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 2, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	# NOSHARE: - { id: 3, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# NOSHARE: - { id: 3, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,			# NOSHARE: stack-id: sgpr-spill, callee-saved-register: '', callee-saved-restored: true,
	# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			# NOSHARE: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }

	# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)			# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.2, addrspace 5)
	# NOSHARE: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# NOSHARE: SI_SPILL_V32_SAVE killed renamable $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# NOSHARE: SI_SPILL_S64_SAVE killed renamable $sgpr4_sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)			# NOSHARE: SI_SPILL_S64_SAVE killed renamable $sgpr4_sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 8 into %stack.1, align 4, addrspace 5)
	# NOSHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# NOSHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0			# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit undef $vgpr0
	# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)			# NOSHARE: $sgpr32 = SI_SPILL_S32_RESTORE %stack.2, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.2, addrspace 5)
	# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.3, addrspace 5)			# NOSHARE: SI_SPILL_S32_SAVE $sgpr32, %stack.3, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.3, addrspace 5)
	# NOSHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# NOSHARE: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)
	# NOSHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)			# NOSHARE: renamable $sgpr4_sgpr5 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 8 from %stack.1, align 4, addrspace 5)
	# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0			# NOSHARE: dead $sgpr30_sgpr31 = SI_CALL killed renamable $sgpr4_sgpr5, @func, csr_amdgpu_highregs, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $vgpr0
	Show All 30 Lines

llvm/test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir

	# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s			# RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs -stress-regalloc=1 -start-before=greedy -stop-after=stack-slot-coloring -o - %s \| FileCheck %s
	---			---

	# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}			# CHECK-LABEL: name: no_merge_sgpr_vgpr_spill_slot{{$}}
	# CHECK: stack:			# CHECK: stack:
	# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# CHECK: - { id: 0, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: default,			# CHECK-NEXT: stack-id: default,

	# CHECK: - { id: 1, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,			# CHECK: - { id: 1, name: '', type: spill-slot, offset: 0, size: 4, alignment: 4,
	# CHECK-NEXT: stack-id: sgpr-spill,			# CHECK-NEXT: stack-id: sgpr-spill,

	# CHECK: SI_SPILL_V32_SAVE killed $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)			# CHECK: SI_SPILL_V32_SAVE killed renamable $vgpr0, %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (store 4 into %stack.0, addrspace 5)
	# CHECK: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)			# CHECK: $vgpr0 = SI_SPILL_V32_RESTORE %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, implicit $exec :: (load 4 from %stack.0, addrspace 5)

	# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.1, addrspace 5)			# CHECK: SI_SPILL_S32_SAVE killed renamable $sgpr5, %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (store 4 into %stack.1, addrspace 5)
	# CHECK: $sgpr5 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.1, addrspace 5)			# CHECK: $sgpr5 = SI_SPILL_S32_RESTORE %stack.1, implicit $exec, implicit $sgpr0_sgpr1_sgpr2_sgpr3, implicit $sgpr32 :: (load 4 from %stack.1, addrspace 5)

	name: no_merge_sgpr_vgpr_spill_slot			name: no_merge_sgpr_vgpr_spill_slot
	tracksRegLiveness: true			tracksRegLiveness: true
	machineFunctionInfo:			machineFunctionInfo:
	Show All 12 Lines

llvm/test/CodeGen/AMDGPU/transform-block-with-return-to-epilog.ll

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	else: ; preds = %else.if.cond
unreachable		unreachable
}		}

define amdgpu_ps { <4 x float> } @test_return_to_epilog_with_optimized_kill(float %val) #0 {		define amdgpu_ps { <4 x float> } @test_return_to_epilog_with_optimized_kill(float %val) #0 {
; GCN-LABEL: name: test_return_to_epilog_with_optimized_kill		; GCN-LABEL: name: test_return_to_epilog_with_optimized_kill
; GCN: bb.0.entry:		; GCN: bb.0.entry:
; GCN: successors: %bb.1(0x40000000), %bb.4(0x40000000)		; GCN: successors: %bb.1(0x40000000), %bb.4(0x40000000)
; GCN: liveins: $vgpr0		; GCN: liveins: $vgpr0
; GCN: renamable $vgpr1 = nofpexcept V_RCP_F32_e32 $vgpr0, implicit $mode, implicit $exec		; GCN: renamable $vgpr1 = nofpexcept V_RCP_F32_e32 renamable $vgpr0, implicit $mode, implicit $exec
; GCN: nofpexcept V_CMP_NGT_F32_e32 0, killed $vgpr1, implicit-def $vcc, implicit $mode, implicit $exec		; GCN: nofpexcept V_CMP_NGT_F32_e32 0, killed renamable $vgpr1, implicit-def $vcc, implicit $mode, implicit $exec
; GCN: $sgpr0_sgpr1 = S_AND_SAVEEXEC_B64 killed $vcc, implicit-def $exec, implicit-def $scc, implicit $exec		; GCN: $sgpr0_sgpr1 = S_AND_SAVEEXEC_B64 killed $vcc, implicit-def $exec, implicit-def $scc, implicit $exec
; GCN: renamable $sgpr0_sgpr1 = S_XOR_B64 $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc		; GCN: renamable $sgpr0_sgpr1 = S_XOR_B64 $exec, killed renamable $sgpr0_sgpr1, implicit-def dead $scc
; GCN: S_CBRANCH_EXECZ %bb.4, implicit $exec		; GCN: S_CBRANCH_EXECZ %bb.4, implicit $exec
; GCN: bb.1.flow.preheader:		; GCN: bb.1.flow.preheader:
; GCN: successors: %bb.2(0x80000000)		; GCN: successors: %bb.2(0x80000000)
; GCN: liveins: $vgpr0, $sgpr0_sgpr1		; GCN: liveins: $vgpr0, $sgpr0_sgpr1
; GCN: nofpexcept V_CMP_NGT_F32_e32 0, killed $vgpr0, implicit-def $vcc, implicit $mode, implicit $exec		; GCN: nofpexcept V_CMP_NGT_F32_e32 0, killed renamable $vgpr0, implicit-def $vcc, implicit $mode, implicit $exec
; GCN: renamable $sgpr2_sgpr3 = S_MOV_B64 0		; GCN: renamable $sgpr2_sgpr3 = S_MOV_B64 0
; GCN: bb.2.flow:		; GCN: bb.2.flow:
; GCN: successors: %bb.3(0x04000000), %bb.2(0x7c000000)		; GCN: successors: %bb.3(0x04000000), %bb.2(0x7c000000)
; GCN: liveins: $vcc, $sgpr0_sgpr1, $sgpr2_sgpr3		; GCN: liveins: $vcc, $sgpr0_sgpr1, $sgpr2_sgpr3
; GCN: renamable $sgpr4_sgpr5 = S_AND_B64 $exec, renamable $vcc, implicit-def $scc		; GCN: renamable $sgpr4_sgpr5 = S_AND_B64 $exec, renamable $vcc, implicit-def $scc
; GCN: renamable $sgpr2_sgpr3 = S_OR_B64 killed renamable $sgpr4_sgpr5, killed renamable $sgpr2_sgpr3, implicit-def $scc		; GCN: renamable $sgpr2_sgpr3 = S_OR_B64 killed renamable $sgpr4_sgpr5, killed renamable $sgpr2_sgpr3, implicit-def $scc
; GCN: $exec = S_ANDN2_B64 $exec, renamable $sgpr2_sgpr3, implicit-def $scc		; GCN: $exec = S_ANDN2_B64 $exec, renamable $sgpr2_sgpr3, implicit-def $scc
; GCN: S_CBRANCH_EXECNZ %bb.2, implicit $exec		; GCN: S_CBRANCH_EXECNZ %bb.2, implicit $exec
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Dynamically clear renamable to avoid constant bus errorsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 291445

llvm/lib/Target/AMDGPU/AMDGPU.h

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/lib/Target/AMDGPU/SIFixRenamableFlags.cpp

llvm/lib/Target/AMDGPU/SIInstrFormats.td

llvm/lib/Target/AMDGPU/SIInstrInfo.h

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.image.store.2d.d16.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll

llvm/test/CodeGen/AMDGPU/atomicrmw-nand.ll

llvm/test/CodeGen/AMDGPU/ds-combine-large-stride.ll

llvm/test/CodeGen/AMDGPU/indirect-addressing-term.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.ds.ordered.swap.ll

llvm/test/CodeGen/AMDGPU/machine-cp-cndmask.mir

llvm/test/CodeGen/AMDGPU/multilevel-break.ll

llvm/test/CodeGen/AMDGPU/regbank-reassign-wave64.mir

llvm/test/CodeGen/AMDGPU/regbank-reassign.mir

llvm/test/CodeGen/AMDGPU/ret.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill-wrong-stack-id.mir

llvm/test/CodeGen/AMDGPU/stack-slot-color-sgpr-vgpr-spills.mir

llvm/test/CodeGen/AMDGPU/transform-block-with-return-to-epilog.ll

[AMDGPU] Dynamically clear renamable to avoid constant bus errors
AbandonedPublic